https://stackoverflow.com/questions/42951905/spark-dataframe-filter
Creating a JSON Dataframe testcase
val df = sc.parallelize(Seq((1,"Emailab"), (2,"Phoneab"), (3, "Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2")
+---+-------+
| c1| c2|
+---+-------+
| 1|Emailab|
| 2|Phoneab|
| 3| Faxab|
| 4| Mail|
| 5| Other|
| 6| MSL12|
| 7| MSL|
| 8| HCP|
| 9| HCP12|
+---+-------+
scala> df.show()
+---+-------+
| c1| c2|
+---+-------+
| 1|Emailab|
| 2|Phoneab|
| 3| Faxab|
| 4| Mail|
| 5| Other|
| 6| MSL12|
| 7| MSL|
| 8| HCP|
| 9| HCP12|
+---+-------+
scala> df.filter($"c2".like("HCP")).show()
+---+---+
| c1| c2|
+---+---+
| 8|HCP|
+---+---+
scala> df.filter($"c2".like("HC")).show()
+---+---+
| c1| c2|
+---+---+
+---+---+
scala> df.filter($"c2".rlike("HC")).show()
+---+-----+
| c1| c2|
+---+-----+
| 8| HCP|
| 9|HCP12|
+---+-----+
scala> df.filter(df("c2")==="HCP").show()
+---+---+
| c1| c2|
+---+---+
| 8|HCP|
+---+---+
scala> df.filter($"c2".contains("HCP")).show()
+---+-----+
| c1| c2|
+---+-----+
| 8| HCP|
| 9|HCP12|
+---+-----+
https://www.tutorialspoint.com/spark_sql/spark_sql_dataframes.htm
Put this in employees.json (SPARK JSON format is different from standard JSON format-no commas between records and no square braces for lists of records) one JSON record per line.
https://www.tutorialspoint.com/spark_sql/spark_sql_dataframes.htm
Put this in employees.json (SPARK JSON format is different from standard JSON format-no commas between records and no square braces for lists of records) one JSON record per line.
{"id" : "1201", "name" : "satish", "age" : "25"} {"id" : "1202", "name" : "krishna", "age" : "28"} {"id" : "1203", "name" : "amith", "age" : "39"} {"id" : "1204", "name" : "javed", "age" : "23"} {"id" : "1205", "name" : "prudvi", "age" : "23"}
val dfs = spark.read.json("employee.json")
dfs.printSchema()
dfs.select("name").show()
dfs.filter(dfs("age") > 23).show()
dfs.groupBy("age").count().show()
dfs.printSchema()
dfs.select("name").show()
dfs.filter(dfs("age") > 23).show()
dfs.groupBy("age").count().show()
scala> val dfs = spark.read.json("employee.json")
dfs: org.apache.spark.sql.DataFrame = [age: string, id: string ... 1 more field]
scala>
scala> dfs.printSchema()
root
|-- age: string (nullable = true)
|-- id: string (nullable = true)
|-- name: string (nullable = true)
scala>
scala> dfs.select("name").show()
+-------+
| name|
+-------+
| satish|
|krishna|
| amith|
| javed|
| prudvi|
+-------+
scala>
scala> dfs.filter(dfs("age") > 23).show()
+---+----+-------+
|age| id| name|
+---+----+-------+
| 25|1201| satish|
| 28|1202|krishna|
| 39|1203| amith|
+---+----+-------+
No comments:
Post a Comment