Replace all occurrences of a String in all columns in a dataframe in scala
Replace all occurrences of a String in all columns in a dataframe in scala
I have a dataframe with 20 Columns and in these columns there is a value XX which i want to replace with Empty String. How do i achieve that in scala. The withColumn function is for a single column, But i want to pass all 20 columns and replace values that have XX in the entire frame with Empty String , Can some one suggest a way.
Thanks
2 Answers
2
You can gather all the stringType
columns in a list and use foldLeft
to apply your removeXX
UDF to each of the columns as follows:
stringType
foldLeft
removeXX
val df = Seq(
(1, "aaXX", "bb"),
(2, "ccXX", "XXdd"),
(3, "ee", "fXXf")
).toDF("id", "desc1", "desc2")
import org.apache.spark.sql.types._
val stringColumns = df.schema.fields.collect{
case StructField(name, StringType, _, _) => name
}
val removeXX = udf( (s: String) =>
if (s == null) null else s.replaceAll("XX", "")
)
val dfResult = stringColumns.foldLeft( df )( (acc, c) =>
acc.withColumn( c, removeXX(df(c)) )
)
dfResult.show
+---+-----+-----+
| id|desc1|desc2|
+---+-----+-----+
| 1| aa| bb|
| 2| cc| dd|
| 3| ee| ff|
+---+-----+-----+
def clearValueContains(dataFrame: DataFrame,token :String,columnsToBeUpdated : List[String])={
columnsToBeUpdated.foldLeft(dataFrame){
(dataset ,columnName) =>
dataset.withColumn(columnName, when(col(columnName).contains(token), "").otherwise(col(columnName)))
}
}
You can use this function .. where you can put token as "XX" . Also the columnsToBeUpdated is the list of columns in which you need to search for the particular column.
dataset.withColumn(columnName, when(col(columnName) === token, "").otherwise(col(columnName)))
you can use the above code to replace on exact match.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Another way to get StringColumns is testDF.dtypes.filter (._2 == "StringType").map (._1)
– Sudheer Palyam
Mar 22 at 5:41