Spark ML: Issue in training after using ChiSqSelector for feature selection
Spark ML: Issue in training after using ChiSqSelector for feature selection
I'm new to spark. I am working on a classification model and want to use ChiSqSelector to choose the important features for model training. But, when I use the selected features by ChiSqSelector to train, it throws the following error:
"IllegalArgumentException: u'Feature 0 is marked as Nominal (categorical), but it does not have the number of values specified."
Interestingly, I got the above mentioned error when I used any of the tree based algorithms. For, Naive bias and logistic regression, I didn't get the error.
I found same result when I used the data provided in the sample code in spark documentation. The error could be reproduced by using the code from spark 2.1.1 documentation:
from pyspark.ml.feature import ChiSqSelector
from pyspark.ml.linalg import Vectors
df = spark.createDataFrame([
(7, Vectors.dense([0.0, 0.0, 18.0, 1.0]), 1.0,),
(8, Vectors.dense([0.0, 1.0, 12.0, 0.0]), 0.0,),
(9, Vectors.dense([1.0, 0.0, 15.0, 0.1]), 0.0,)], ["id", "features",
"clicked"])
selector = ChiSqSelector(numTopFeatures=2, featuresCol="features",
outputCol="selectedFeatures", labelCol="clicked")
result = selector.fit(df).transform(df)
print("ChiSqSelector output with top %d features selected" %
selector.getNumTopFeatures())
result.show()
from pyspark.ml.classification import DecisionTreeClassifier
dt = DecisionTreeClassifier(labelCol="clicked",
featuresCol="selectedFeatures")
model = dt.fit(result)
Someone reported the problem at Apache Spark User List (following link) but nobody responded. http://apache-spark-user-list.1001560.n3.nabble.com/Application-of-ChiSqSelector-results-in-quot-Feature-0-is-marked-as-Nominal-quot-td27040.html
I would highly appreciate if someone sheds some light on it. Thanks in advance.
1 Answer
1
I met this problem, too. feature column SparseVector -> DenseVector can make it run
I don't know if there's a better way to do it
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.