If else condition in PySpark - Using When Function
In SQL, we often use case when statements to handle conditional logic. PySpark provides a similar functionality using the `when` function to manage multiple conditions.
In this article, we will cover the following:
| ID | First Name | Age | Last Name | Gender | 
|---|---|---|---|---|
| 101 | Ali | 29 | Khan | Male | 
| 102 | Priya | 35 | Kumari | Female | 
| 103 | Chandan | 23 | Kumar | Male | 
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, concat_ws, lit, when
spark = SparkSession.builder \
    .appName("Case When in PySpark with Example") \
    .master("local") \
    .getOrCreate()
data = [
    (101, "Ali", 29, "khan", "Male"),
    (102, "Priya", 35, "kumari", "Female"),
    (103, "Chandan", 23, "kumar", "Male")
]
columns = ["ID", "First Name", "Age", "Last Name", "Gender"]
test_df = spark.createDataFrame(data, columns)
transformed_df = test_df.withColumn(
    "full_name",
    when(
col("Gender") == "Male",
concat_ws(" ", lit("Mr."), col("First Name"), col("Last Name"))
    ).when(
col("Gender") == "Female",
concat_ws(" ", lit("Ms."), col("First Name"), col("Last Name"))
    ).otherwise(
concat_ws(" ", lit("Unknown"), col("First Name"), col("Last Name"))
    )
)
transformed_df.show()
spark.stop()
      
      
      As you can see in the output, an additional column has been added based on the conditional logic applied.