Subtract the values of columns from two different data frames in PySpark to find RMSE… here is a solution to the problem.
Subtract the values of columns from two different data frames in PySpark to find RMSE
I can’t figure it out. I’m trying to calculate RMSE between test and prediction data.
Test
col1 col2
a 2
b 3
Forecast
col1 col2
a 4
b 5
I’m trying to do this test (col2)-prediction (col2). That is
2-4 =-2
3-5 =-2
I tried
test.select("col2").subtract(prediction.select("col2"))
But I didn’t get the results I wanted. I tried to get this result to find RMSE. Is there a built-in function in Spark for finding RMSE?
Thank you.
Solution
It is a connection and an arithmetic subtraction:
test.join(prediction, on="col1").withColumn("sub", test.col2-prediction.col2)