Read a TextFile Using PySpark

How to Read a Text File Using PySpark with Example

Reading a text file in PySpark is straightforward with the textFile method, which returns an RDD. To obtain a DataFrame, you should use spark.read.text instead. This method loads the text file into a DataFrame, making it easier to work with structured data. It supports various file sources and allows for efficient data processing and analysis. This approach simplifies handling and manipulating text data within Spark applications.

from pyspark.sql import SparkSession

spark_session = SparkSession.builder.master("local").appName("Read text file using pyspark with example").getOrCreate()
textfile_df_with_schema=spark_session.read.text("/Users/apple/PycharmProjects/pyspark/data/text/data.txt")
textfile_df_with_schema.show(truncate=False)

Output:

Alps

Below list contains some most commonly used options while reading a csv file