PySpark dataframe show all Rows

How to Display DataFrame rows in PySpark with Examples

DataFrames play a vital role in PySpark for performing data manipulation and analysis. Displaying DataFrames in a clear and readable format is essential for understanding and debugging data transformations. In this guide, we'll walk through how to display a DataFrame in PySpark.

1. Sample Data

Let's start by creating a sample DataFrame to demonstrate how to display its contents:

Movie Name	Review
Kalki 2898 AD	"Kalki 2898 AD" is a cinematic marvel that seamlessly blends mythology with modern storytelling, and Prabhas delivers a performance that is both powerful and captivating.
Robot	This is one of the best movies I've ever watched. After 2000, all of Shankar's movies have been either a blockbuster or a super hit.

1. Import Libraries

from pyspark.sql import SparkSession
from pyspark import Row
from pyspark.sql.types import StructType,StructField,StringType

2. Create DataFrame

spark_session=SparkSession.builder.master("local").appName("print rows ").getOrCreate()
data = [
Row("Kalki 2898 AD","\"Kalki\" is a cinematic marvel that seamlessly blends mythology with modern storytelling, and Prabhas delivers a performance that is both powerful and captivating"),
Row( "Robot","This is one of the best movies I've ever watched. After 2000 all of Shankar's movies have been either a blockbuster or super hit."),
]
schema=StructType( [StructField("movie",StringType(),True),StructField("review",StringType(),True)])
df=spark_session.createDataFrame(data,schema)

2. Use show to print rows

By default show function prints 20 rows

df.show()

3. Use show to print n rows

Below statement will print 10 rows

df.show(10)

4. Use show with truncate argument

if you use false option then it will not truncate column value its too long

df.show(2,false)

display dataframe pyspark with truncate false

4. Compete Code

from pyspark.sql import SparkSession
from pyspark import Row
from pyspark.sql.types import StructType,StructField,StringType

spark_session=SparkSession.builder.master("local").appName("print rows ").getOrCreate()

data = [
Row("Kalki 2898 AD","\"Kalki\" is a cinematic marvel that seamlessly blends mythology with modern storytelling, and Prabhas delivers a performance that is both powerful and captivating"),
Row( "Robot","This is one of the best movies I've ever watched. After 2000 all of Shankar's movies have been either a blockbuster or super hit."),
]
schema=StructType( [StructField("movie",StringType(),True),StructField("review",StringType(),True)])
df=spark_session.createDataFrame(data,schema)
df.show(2,False)