https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/3741049972324885/3696710289009770/4413065072037724/latest.html
https://stackoverflow.com/questions/29844144/better-way-to-convert-a-string-field-into-timestamp-in-spark
Notes:
Now lets replace the date column with the converted timestamp column.
What follows is not quite right.
.alias() is the key here to change the name of the new column to "date"
https://stackoverflow.com/questions/29844144/better-way-to-convert-a-string-field-into-timestamp-in-spark
Notes:
- Dates are checked for validity. 2017/11/31 is an invalid date. Nov. only has 30 days and a null is returned.
- Note the regex year, month, day order. It must match the date string in the input otherwise a null will be returned.
- $"date" is the date column from the dataframe "df".
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | import org.apache.spark.sql.functions._ import org.apache.spark.sql.types._ val df = Seq( (1, "2014/01/01 23:00:01"), (1, "2016/12/29 09:54:00"), (1, "2016/05/09 10:12:43"), (1, "2014/11/30 10:12:43"), (1, "2014/11/31 12:40:32")).toDF("id", "date") import org.apache.spark.sql.functions.to_timestamp val ts = to_timestamp($"date", "yyyy/MM/dd HH:mm:ss") df.withColumn("ts", ts).show(5, false) |
Now lets replace the date column with the converted timestamp column.
What follows is not quite right.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | scala> val res = df.select($"id").show() +---+ | id| +---+ | 1| | 1| | 1| | 1| | 1| +---+ scala> val res = df.select($"id", ts) res: org.apache.spark.sql.DataFrame = [id: int, to_timestamp(`date`, 'yyyy/MM/dd HH:mm:ss'): timestamp] scala> res.show() +---+-------------------------------------------+ | id|to_timestamp(`date`, 'yyyy/MM/dd HH:mm:ss')| +---+-------------------------------------------+ | 1| 2014-01-01 23:00:01| | 1| 2016-12-29 09:54:00| | 1| 2016-05-09 10:12:43| | 1| 2014-11-30 10:12:43| | 1| null| +---+-------------------------------------------+ |
.alias() is the key here to change the name of the new column to "date"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | scala> val ts = to_timestamp($"date", "yyyy/MM/dd HH:mm:ss").alias("date") ts: org.apache.spark.sql.Column = to_timestamp(`date`, 'yyyy/MM/dd HH:mm:ss') AS `date` scala> val res = df.select($"id", ts) res: org.apache.spark.sql.DataFrame = [id: int, date: timestamp] scala> res.show() +---+-------------------+ | id| date| +---+-------------------+ | 1|2014-01-01 23:00:01| | 1|2016-12-29 09:54:00| | 1|2016-05-09 10:12:43| | 1|2014-11-30 10:12:43| | 1| null| +---+-------------------+ |
No comments:
Post a Comment