Thursday, August 31, 2017

Spark data frames/data sets

Spark2 data sets
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/2201444230243967/3601578643761083/latest.html



https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-DataFrame.html

Spark SQL borrowed the concept of DataFrame from pandas' DataFrame and made it immutableparallel (one machine, perhaps with many processors and cores) and distributed (many machines, perhaps with many processors and cores).
+

http://pandas.pydata.org/pandas-docs/stable/dsintro.html

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset

https://stackoverflow.com/questions/38137741/how-to-write-a-dataframe-schema-to-file-in-scala

import java.io.PrintWriter;
val filePath = "/tmp/schema_file"
new PrintWriter(filePath) { write(df.schema.treeString); close }
https://docs.databricks.com/spark/latest/spark-sql/complex-types.html#transform-complex-data-types-scala

https://www.balabit.com/blog/spark-scala-dataset-tutorial/

No comments:

Post a Comment