Categories
apache-spark dataframe json scala

how to convert json string to dataframe on spark

I want to convert string variable below to dataframe on spark.

val jsonStr = "{ "metadata": { "key": 84896, "value": 54 }}"

I know how to create dataframe from json file.

sqlContext.read.json("file.json")

but I don’t know how to create dataframe from string variable.

How can I convert json String variable to dataframe.

For Spark 2.2+:

import spark.implicits._
val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
val df = spark.read.json(Seq(jsonStr).toDS)

For Spark 2.1.x:

val events = sc.parallelize("""{"action":"create","timestamp":"2016-01-07T00:01:17Z"}""" :: Nil)    
val df = sqlContext.read.json(events)

Hint: this is using sqlContext.read.json(jsonRDD: RDD[Stirng]) overload.
There is also sqlContext.read.json(path: String) where it reads a Json file directly.

For older versions:

val jsonStr = """{ "metadata": { "key": 84896, "value": 54 }}"""
val rdd = sc.parallelize(Seq(jsonStr))
val df = sqlContext.read.json(rdd)