How to rename a column in Spark Dataframe Scala
Renaming columns is a common operation in data processing. In Apache Spark, you can use the withColumnRenamed
function to rename columns in a DataFrame using Scala. This tutorial will guide you through the process of using this function with practical examples and explanations.
Roll | First Name | Age | Last Name |
---|---|---|---|
1 | Rahul | 30 | Yadav |
2 | Sanjay | 20 | gupta |
3 | Ranjan | 67 | kumar |
First, you need to import the necessary libraries:
import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
For demonstration purposes, let's create a sample DataFrame:
val schema = StructType( Array( StructField("roll", IntegerType, true), StructField("first_name", StringType, true), StructField("age", IntegerType, true), StructField("last_name", StringType, true) )) val data = Seq( Row(1, "rahul", 30, "yadav"), Row(2, "sanjay", 20, "gupta"), Row(3, "ranjan", 67, "kumar") ) val rdd = sparkSession.sparkContext.parallelize(data) val testDF = sparkSession.createDataFrame(rdd, schema)
val transformedDF=testDF.withColumnRenamed("roll","roll_number")
import org.apache.spark.sql.functions.col import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} object WithColumnRenamedSpark { def main(args: Array[String]): Unit = { val sparkSession = SparkSession .builder() .appName("rename a column of spark dataframe scala") .master("local") .getOrCreate() val schema = StructType(Array( StructField("roll", IntegerType, true), StructField("first_name", StringType, true), StructField("age", IntegerType, true), StructField("last_name", StringType, true) )) val data = Seq( Row(1, "rahul", 30, "yadav"), Row(2, "sanjay", 20, "gupta"), Row(3, "ranjan", 67, "kumar"), ) val rdd = sparkSession.sparkContext.parallelize(data) val testDF = sparkSession.createDataFrame(rdd, schema) val transformedDF=testDF.withColumnRenamed("roll","roll_number") transformedDF.show() sparkSession.stop() } }
That's it! You've successfully applied withColumnRenamed to a DataFrame in Spark using Scala.