In this post I am going to explain creating a DataFrame from list of tuples in PySpark. I am using Python2 for scripting and Spark 2.0.1
Create a list of tuples
listOfTuples = [(101, "Satish", 2012, "Bangalore"),
(102, "Ramya", 2013, "Bangalore"),
(103, "Teja", 2014, "Bangalore"),
(104, "Kumar", 2012, "Hyderabad")]
Create Dataframe out of listOfTuples
df = spark.createDataFrame(listOfTuples , ["id", "name", "year", "city"])
Check the schema
df.printSchema()
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- year: long (nullable = true)
|-- city: string (nullable = true)
Print data
df.show()
+---+------+----+---------+
| id| name|year| city|
+---+------+----+---------+
|101|Satish|2012|Bangalore|
|102| Ramya|2013|Bangalore|
|103| Teja|2014|Bangalore|
|104| Kumar|2012|Hyderabad|
+---+------+----+---------+
Enjoy Spark!
Create a list of tuples
listOfTuples = [(101, "Satish", 2012, "Bangalore"),
(102, "Ramya", 2013, "Bangalore"),
(103, "Teja", 2014, "Bangalore"),
(104, "Kumar", 2012, "Hyderabad")]
Create Dataframe out of listOfTuples
df = spark.createDataFrame(listOfTuples , ["id", "name", "year", "city"])
Check the schema
df.printSchema()
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- year: long (nullable = true)
|-- city: string (nullable = true)
Print data
df.show()
+---+------+----+---------+
| id| name|year| city|
+---+------+----+---------+
|101|Satish|2012|Bangalore|
|102| Ramya|2013|Bangalore|
|103| Teja|2014|Bangalore|
|104| Kumar|2012|Hyderabad|
+---+------+----+---------+
Enjoy Spark!
Very helpful!
ReplyDeleteThanks!!
much appreciated.
ReplyDelete