JSON

Sparksql语法,读json

字号+ 作者:H5之家 来源:H5之家 2017-05-03 11:06 我要评论( )

Sparksql语法,读json

sparksql语法,读json --样本 [hadoop@node1 resources]$ pwd /home/hadoop/spark-1.5.2-bin-hadoop2.6/examples/src/main/resources [hadoop@node1 resources]$ cat people.json {"name":"Michael"} {"name":"Andy", "age":30} {"name":"Justin", "age":19} [hadoop@node1 resources]$ cat people.txt Michael, 29 Andy, 30 Justin, 19 [hadoop@node1 resources]$ hadoop fs -put people* /test/input --技巧: tab键会显示所有可执行的命令 --测试 [hadoop@node1 spark-1.5.2-bin-hadoop2.6]$ spark-shell --读取json文件 scala> val df=sqlContext.read.json("hdfs://node1:8020/test/input/people.json") df: org.apache.spark.sql.DataFrame = [age: bigint, name: string] scala> df.show +----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+ scala> df.printSchema() --desc xx root |-- age: long (nullable = true) |-- name: string (nullable = true) scala> df.select("age").show() +----+ | age| +----+ |null| | 30| | 19| +----+ --下面等价 scala> df.select("name","age").show() scala> df.select($"name",$"age").show() scala> df.select(df("name"),df("age")).show() --字段可以用df()括号括起来,要带双引号 +-------+----+ | name| age| +-------+----+ |Michael|null| | Andy| 30| | Justin| 19| +-------+----+ scala> df.selectExpr("name", "age as age_old", "abs(age) as age_abs").show +-------+-------+-------+ | name|age_old|age_abs| +-------+-------+-------+ |Michael| null| null| | Andy| 30| 30| | Justin| 19| 19| +-------+-------+-------+ scala> df.count res12: Long = 3 scala> df.filter(df("age")>21).show --show是返回字段和表数据 +---+----+ |age|name| +---+----+ | 30|Andy| +---+----+ scala> df.filter(df("age")>21).collect --collect是返回集合 res14: Array[org.apache.spark.sql.Row] = Array([30,Andy]) scala> df.groupBy("age").count().show() +----+-----+ | age|count| +----+-----+ |null| 1| | 19| 1| | 30| 1| +----+-----+ scala> df.agg(max("age"),sum("age"),min("age"),avg("age")).show +--------+--------+--------+--------+ |max(age)|sum(age)|min(age)|avg(age)| +--------+--------+--------+--------+ | 30| 49| 19| 24.5| +--------+--------+--------+--------+

Tags:Spark   Hadoop  

 

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

相关文章
  • 权威课程:dubbo,Zookeeper,solr,scala,netty,Hadoop,数据

    权威课程:dubbo,Zookeeper,solr,scala,netty,Hadoop,数据

    2016-11-06 12:00

  • 利用Spark将json文件导入Cassandra

    利用Spark将json文件导入Cassandra

    2015-10-14 16:03

网友点评