programming matrix: Spark data frames/data sets

Spark2 data sets
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/8599738367597028/2201444230243967/3601578643761083/latest.html

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-DataFrame.html

Spark SQL borrowed the concept of DataFrame from pandas' DataFrame and made it immutable, parallel (one machine, perhaps with many processors and cores) and distributed (many machines, perhaps with many processors and cores).

http://pandas.pydata.org/pandas-docs/stable/dsintro.html

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset

https://stackoverflow.com/questions/38137741/how-to-write-a-dataframe-schema-to-file-in-scala

import java.io.PrintWriter;
val filePath = "/tmp/schema_file"
new PrintWriter(filePath) { write(df.schema.treeString); close }

https://docs.databricks.com/spark/latest/spark-sql/complex-types.html#transform-complex-data-types-scala

https://www.balabit.com/blog/spark-scala-dataset-tutorial/

programming matrix

Thursday, August 31, 2017

Spark data frames/data sets

No comments:

Post a Comment

Followers

Blog Archive