If else condition in PySpark - Using When Function

If else condition in PySpark - Using When Function

In SQL, we often use case when statements to handle conditional logic. PySpark provides a similar functionality using the `when` function to manage multiple conditions.

In this article, we will cover the following:


For example, consider the sample data below:

Sample Data

ID First Name Age Last Name Gender
101 Ali 29 Khan Male
102 Priya 35 Kumari Female
103 Chandan 23 Kumar Male

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, concat_ws, lit, when

spark = SparkSession.builder \
    .appName("Case When in PySpark with Example") \
    .master("local") \
    .getOrCreate()

data = [
    (101, "Ali", 29, "khan", "Male"),
    (102, "Priya", 35, "kumari", "Female"),
    (103, "Chandan", 23, "kumar", "Male")
]

columns = ["ID", "First Name", "Age", "Last Name", "Gender"]
test_df = spark.createDataFrame(data, columns)

transformed_df = test_df.withColumn(
    "full_name",
    when(
col("Gender") == "Male",
concat_ws(" ", lit("Mr."), col("First Name"), col("Last Name"))
    ).when(
col("Gender") == "Female",
concat_ws(" ", lit("Ms."), col("First Name"), col("Last Name"))
    ).otherwise(
concat_ws(" ", lit("Unknown"), col("First Name"), col("Last Name"))
    )
)

transformed_df.show()
spark.stop()

      

As you can see in the output, an additional column has been added based on the conditional logic applied.