Spark DataFrame, how to to aggregate sequence of columns?

Multi tool use
Spark DataFrame, how to to aggregate sequence of columns?
I have a dataframe and I could do aggregate with static column names i.e:
df.groupBy("_c0", "_c1", "_c2", "_c3", "_c4").agg(
concat_ws(",", collect_list("_c5")),
concat_ws(",", collect_list("_c6")))
And it works fine but how to do same if I get sequence of groupby columns and sequence of aggregate columns?
In other words, what if I have
val toGroupBy = Seq("_c0", "_c1", "_c2", "_c3", "_c4")
val toAggregate = Seq("_c5", "_c6")
and want to perform the above?
Thanks for editing.
– Spark Scala Developer
Jul 3 at 9:34
sounds like you want a DF to be an RDD
– thebluephantom
Jul 3 at 9:36
Please provide sample input, It will help to do better.
– Manoj Kumar Dhakd
Jul 3 at 10:06
1 Answer
1
To perform the same groupBy
and aggregation using the sequences you can do the following:
groupBy
val aggCols = toAggregate.map(c => expr(s"""concat_ws(",", collect_list($c))"""))
df.groupBy(toGroupBy.head, toGroupBy.tail:_*).agg(aggCols.head, aggCols.tail:_*)
The expr
function takes an expression and evaluates it into a column. Then the varargs variants of groupBy
and agg
are applied on the lists of columns.
expr
groupBy
agg
It works and thanks
– Spark Scala Developer
Jul 3 at 10:29
@SparkScalaDeveloper No problems, happy to help :)
– Shaido
Jul 3 at 10:35
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I tried to make the question a bit clearer, please check so I didn't misunderstand it.
– Shaido
Jul 3 at 9:32