Python – Unable to access RowMatrix methods in PySpark: columnSimilarities(), computeColumnSummaryStatistics()

Unable to access RowMatrix methods in PySpark: columnSimilarities(), computeColumnSummaryStatistics()… here is a solution to the problem.

Unable to access RowMatrix methods in PySpark: columnSimilarities(), computeColumnSummaryStatistics()

I’m trying to use functions columnSimilarities(), computeColumnSummaryStatistics().

  • In particular, the columnSimilarities() function mentioned in this article:

I’m using a list of sparse vectors from mlib.

sparse_vectors = []

for cust, group in df.groupby(0):

i_v = zip(group[1].values, group[2].values)
    i_v = sorted(i_v)
    indices = [x[0] for x in i_v]
    values = [x[1] for x in i_v]
    sparse_vectors.append(Vectors.sparse(len(df[1].unique()), indices, values))

rows = sc.parallelize(sparse_vectors)
mat = RowMatrix(rows)

I get the error :

AttributeError: ‘RowMatrix’ object has no attribute


AttributeError: ‘RowMatrix’ object has no attribute

Every time I run the function.

Is this a problem with PySpark, not Scala Spark? I also can’t find the page for the RowMatrix function by googling.



You can’t access these methods because they are not currently implemented in PySpark (Spark 1.6).

IndexedRowMatrix.columnSimilarities (see SPARK-12041) Available in the current master, but to use it, you have to build Spark from source.

Related Problems and Solutions