How to rename a column in Spark Dataframe Scala

How to rename a column in Spark Dataframe

Renaming columns is a common operation in data processing. In Apache Spark, you can use the withColumnRenamed function to rename columns in a DataFrame using Scala. This tutorial will guide you through the process of using this function with practical examples and explanations.

For example I have considered below sample data

Sample Data


Roll First Name Age Last Name
1 Rahul 30 Yadav
2 Sanjay 20 gupta
3 Ranjan 67 kumar

Step 1: Import Required Libraries

First, you need to import the necessary libraries:

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
     

Step 2: Create Sample DataFrame

For demonstration purposes, let's create a sample DataFrame:

val schema = StructType( Array(
  StructField("roll", IntegerType, true),
  StructField("first_name", StringType, true),
  StructField("age", IntegerType, true),
   StructField("last_name", StringType, true)
))
val data = Seq(
  Row(1, "rahul", 30, "yadav"),
  Row(2, "sanjay", 20, "gupta"),
  Row(3, "ranjan", 67, "kumar")
)
val rdd = sparkSession.sparkContext.parallelize(data)
val testDF = sparkSession.createDataFrame(rdd, schema)
        

Step 3: Use withColumnRenamed method to rename

val transformedDF=testDF.withColumnRenamed("roll","roll_number")

Complete Code

import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
  
object WithColumnRenamedSpark {
  
    def main(args: Array[String]): Unit = {
      val sparkSession = SparkSession
        .builder()
        .appName("rename a column of spark dataframe scala")
        .master("local")
        .getOrCreate()
      val schema = StructType(Array(
        StructField("roll", IntegerType, true),
        StructField("first_name", StringType, true),
        StructField("age", IntegerType, true),
        StructField("last_name", StringType, true)
      ))
      val data = Seq(
        Row(1, "rahul", 30, "yadav"),
        Row(2, "sanjay", 20, "gupta"),
        Row(3, "ranjan", 67, "kumar"),
      )
      val rdd = sparkSession.sparkContext.parallelize(data)
      val testDF = sparkSession.createDataFrame(rdd, schema)
      val transformedDF=testDF.withColumnRenamed("roll","roll_number")
      transformedDF.show()
      sparkSession.stop()
  
    }
  
  }   

That's it! You've successfully applied withColumnRenamed to a DataFrame in Spark using Scala.

Output

Alps