Guide to Apache Spark DataFrame distinct() Method
Deleting a column from a Spark DataFrame in Scala is achieved using methods like drop() or select(), which exclude specified columns from the resulting DataFrame.
Roll | First Name | Age | Last Name |
---|---|---|---|
1 | Rahul | 30 | Yadav |
2 | Sanjay | 20 | gupta |
3 | Ranjan | 67 | kumar |
First, you need to import the necessary libraries:
import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}
For demonstration purposes, let's create a sample DataFrame:
val schema = StructType(Array( StructField("roll", IntegerType, true), StructField("first_name", StringType, true), StructField("age", IntegerType, true), StructField("last_name", StringType, true) )) val data = Seq( Row(1, "rahul", 30, "yadav"), Row(2, "sanjay", 20, "gupta"), Row(3, "ranjan", 67, "kumar") ) val rdd = sparkSession.sparkContext.parallelize(data) val testDF = sparkSession.createDataFrame(rdd, schema)
val transformedDF=testDF.drop(col("roll")) //to delete/drop multiplecolumn val transformedDF=testDF.drop("roll","age")
import org.apache.spark.sql.functions.col import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} object DroporDeleteColumn { def main(args: Array[String]): Unit = { val sparkSession = SparkSession .builder() .appName("delete or drop column from a spark dataframe") .master("local") .getOrCreate() val schema = StructType(Array( StructField("roll", IntegerType, true), StructField("first_name", StringType, true), StructField("age", IntegerType, true), StructField("last_name", StringType, true) )) val data = Seq( Row(1, "rahul", 30, "yadav"), Row(2, "sanjay", 20, "gupta"), Row(3, "ranjan", 67, "kumar"), ) val rdd = sparkSession.sparkContext.parallelize(data) val testDF = sparkSession.createDataFrame(rdd, schema) val transformedDF=testDF.drop(col("roll")) transformedDF.printSchema() sparkSession.stop() } }
That's it! You've successfully applied filter and where conditions to a DataFrame in Spark using Scala.