If else condition in PySpark - Using When Function
In SQL, we often use case when statements to handle conditional logic. PySpark provides a similar functionality using the `when` function to manage multiple conditions.
In this article, we will cover the following:
ID | First Name | Age | Last Name | Gender |
---|---|---|---|---|
101 | Ali | 29 | Khan | Male |
102 | Priya | 35 | Kumari | Female |
103 | Chandan | 23 | Kumar | Male |
from pyspark.sql import SparkSession from pyspark.sql.functions import col, concat_ws, lit, when spark = SparkSession.builder \ .appName("Case When in PySpark with Example") \ .master("local") \ .getOrCreate() data = [ (101, "Ali", 29, "khan", "Male"), (102, "Priya", 35, "kumari", "Female"), (103, "Chandan", 23, "kumar", "Male") ] columns = ["ID", "First Name", "Age", "Last Name", "Gender"] test_df = spark.createDataFrame(data, columns) transformed_df = test_df.withColumn( "full_name", when( col("Gender") == "Male", concat_ws(" ", lit("Mr."), col("First Name"), col("Last Name")) ).when( col("Gender") == "Female", concat_ws(" ", lit("Ms."), col("First Name"), col("Last Name")) ).otherwise( concat_ws(" ", lit("Unknown"), col("First Name"), col("Last Name")) ) ) transformed_df.show() spark.stop()
As you can see in the output, an additional column has been added based on the conditional logic applied.