Thursday, August 31, 2017

Spark-shell: put everything on one line

There are many examples of starting Spark sessions in blogs that have the session code on multiple lines.

import org.apache.spark.sql.SparkSession
val spark: SparkSession = SparkSession.builder
  .master("local[*]")
  .appName("My Spark Application")
  .config("spark.sql.warehouse.dir", "c:/Temp") (1)
  .getOrCreate
Put the code on one line if the code is being executed in the spark-shell

 
val spark = SparkSession.builder.master("local[*]").appName("My Spark Application").config("spark.sql.warehouse.dir", "/tmp").getOrCreate
Otherwise the following errors will occur:

scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession
scala> val spark = SparkSession.builder
spark: org.apache.spark.sql.SparkSession.Builder = org.apache.spark.sql.SparkSession$Builder@5a20f793
scala>   .master("local[*]")
:1: error: illegal start of definition
  .master("local[*]")
  ^
scala>   .appName("My Spark Application")
:1: error: illegal start of definition
  .appName("My Spark Application")
  ^
scala>   .config("spark.sql.warehouse.dir", "/tmp") (1)
:1: error: illegal start of definition
  .config("spark.sql.warehouse.dir", "/tmp") (1)
  ^
scala>   .getOrCreate
:1: error: illegal start of definition
  .getOrCreate
  ^
scala> import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SparkSession
This is the correct execution of the session code in the spark-shell
scala> val spark = SparkSession.builder.master("local[*]").appName("My Spark Application").config("spark.sql.warehouse.dir", "/tmp").getOrCreate
17/08/31 23:38:16 WARN SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.
spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession@62b0792

No comments:

Post a Comment