Lanczsos – feature vector in Mahout
I’m trying machine learning with Java Mahout. I have downloaded all the data I want in MySQL. Where I’m stuck is when my “SparseRowMatrix” type variable does all the calculations and rearrangements. I simply don’t understand how to call either of the two methods I see fit:
1) org.apache.mahout.math.decomposer.lanczos.LanczosSolver
2) org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
Any suggestions help at this point!
Solution
DistributedLanczosSolver
implements a Tool interface, so you can run it as a
regular Hadoop job, for example
hadoop jar $MAHOUT_HOME/mahout-examples-0.5-job.jar org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver --input /path/to/input --output /path/to/ output --numCols 42 --numRows 42 --cleansvd "true" --rank 5
You can also call it directly from Java using the following method:
ToolRunner.run(new DistributedLanczosSolver().job(), args);
Or, if you don’t need to do this in a distributed fashion, the LanczosSolver.solve
method is the one you’re looking for, and you have to pass matrices, eigenvectors, and eigenvalues to its values. It uses the Lanczos algorithm to do something complicated behind the scenes that I can’t explain, so I suggest you check it straight for more clarity in the source code .