Deleting a Column in PySpark
In PySpark, removing a column from a DataFrame is quite simple. This tutorial will show you how to do it. We’ll provide clear, step-by-step examples to make the process easy to follow.
Roll | First Name | Age | Last Name |
---|---|---|---|
1 | Ali | 30 | Khan |
2 | Sanjay | 20 | Kumar |
3 | Rahul | 67 | kumar |
You can delete a column from a PySpark DataFrame using the drop
method. Here's an example:
from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.appName("Delete Column Example").getOrCreate() # Sample DataFrame data = [("Ali", "Khan", 30), ("Sanjay", "Kumar", 20), ("Rahul", "Kumar", 67)] columns = ["FirstName", "LastName", "Age"] df = spark.createDataFrame(data, schema=columns) # Delete the 'Age' column df = df.drop("Age") df.show()
# Delete the 'Age' column df = df.drop("Age","LastName") df.show()
from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.appName("Delete Column Example").getOrCreate() # Sample DataFrame data = [("Ali", "Khan", 30), ("Sanjay", "Kumar", 20), ("Rahul", "Kumar", 67)] columns = ["FirstName", "LastName", "Age"] df = spark.createDataFrame(data, schema=columns) # Delete the 'Age' column df = df.drop("Age") #df = df.drop("Age","LastName") df.show()