Spark's Column.isin function does not take List


Spark's Column.isin function does not take List



I am trying to filter out rows from my Spark Dataframe.


val sequence = Seq(1,2,3,4,5)
df.filter(df("column").isin(sequence))



Unfortunately, I get an unsupported literal type error


java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(1,2,3,4,5)



according to the documentation it takes a scala.collection.Seq list



I guess I don't want a literal? Then what can I take in, some sort of wrapper class?




2 Answers
2



@JustinPihony's answer is correct but it's incomplete. The isin function takes a repeated parameter for argument, so you'll need to pass it as so :


isin


scala> val df = sc.parallelize(Seq(1,2,3,4,5,6,7,8,9)).toDF("column")
// df: org.apache.spark.sql.DataFrame = [column: int]

scala> val sequence = Seq(1,2,3,4,5)
// sequence: Seq[Int] = List(1, 2, 3, 4, 5)

scala> val result = df.filter(df("column").isin(sequence : _*))
// result: org.apache.spark.sql.DataFrame = [column: int]

scala> result.show
// +------+
// |column|
// +------+
// | 1|
// | 2|
// | 3|
// | 4|
// | 5|
// +------+





this also helped me understand what was going on stackoverflow.com/questions/6051302/…
– Jake Fund
Apr 12 '16 at 12:37





It's all in the Scala Language Specification. :)
– eliasah
Apr 12 '16 at 12:54



This is happening because the underlying Scala implementation uses varargs, so the documentation in Java is not quite correct. It is using the @varargs annotation, so you can just pass in an array.


varargs


@varargs






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

api-platform.com Unable to generate an IRI for the item of type

How to set up datasource with Spring for HikariCP?

Display dokan vendor name on Woocommerce single product pages