SparkSession PySpark example

How to Create SparkSession in PySpark with Example

Creating a Spark session is a crucial step when working with PySpark for big data processing tasks. This guide will walk you through the process of setting up a Spark session in PySpark.

A SparkSession is the entry point for using Spark with the DataFrame and Dataset API. It provides a unified interface for interacting with Spark's various functionalities. Prior to Spark 2.0, SparkContext was the main entry point, but SparkSession now integrates the functionalities of SparkContext and provides additional features for easier data processing.

A SparkSession includes the following components:

Spark Context: The entry point to Spark functionality and the core component that connects to the Spark cluster.
SQL Context: Provides functionality for working with structured data and executing SQL queries.
Streaming Context: Allows for processing real-time data streams.
Hive Context: Supports querying data stored in Hive and integrates with HiveQL.

Lets write our first PySpark Program

Now that we've set up PySaprk on our local machine, it's time to write our very first program. which will create a sparksession.

from pyspark.sql import SparkSession

spark_session = SparkSession.builder.master("local").appName("testing").getOrCreate()
spark_session.sql("select 1").show()

appName("ExampleApp"): Sets the name of your Spark application.
config("spark.master", "local"): Configures the master URL for the Spark application; in this case, it's set to run locally.
getOrCreate(): Either gets an existing Spark session or creates a new one if none exists.

Note:- This spark session looks very small because we have not used any additional conf. please check here to check list of conf