Processing Power – Mobile vs Desktop – 100x Difference?
Has anyone compared the processing power of mobile devices and PCs? I have a very simple matrix to work with. Coding in Java, my old PC took about 115ms to do the job. Very, very identical functionality takes 17,000 milliseconds. I was shocked. I didn’t expect the tablet to be close to the PC – but I didn’t expect it to also be ~150 times slower!!
Has anyone had a similar experience? Any suggestions? Does it help if I write code in C and use the Android NDK?
Base code in Java:
package mainpackage;
import java.util.Date;
public class mainclass {
public static void main(String[] args){
Date startD = new Date();
double[][] testOut;
double[] v = {1,0,0};
double t;
for (int i = 0; i < 100000; i++) {
t=Math.random();
testOut=rot_mat(v, t);
}
Date endD = new Date();
System.out.println("Time Taken ms: "+(-startD.getTime()+endD.getTime()));
}
public static double[][] rot_mat(double v[], double t)
{
double absolute;
double x[] = new double[3];
double temp[][] = new double[3][3];
double temp_2[][] = new double[3][3];
double sum;
int i;
int k;
int j;
Normalize the v matrix into k
absolute = abs_val_vec(v);
for (i = 0; i < 3; i++)
{
x[i] = v[i] / absolute;
}
Create 3x3 matrix kx
double kx[][] = {{0, -x[2], x[1]},
{x[2], 0, -x[0]},
{-x[1], x[0], 0}};
Calculate output
Calculate third term in output
for (i = 0; i < 3; i++)
{
for (j = 0; j < 3; j++)
{
sum = 0;
for (k = 0; k < 3; k++)
{
sum = sum + kx[i][k] * kx[k][j];
}
temp[i][j] = (1-Math.cos(t))*sum;
}
}
Calculate second term in output
for (i = 0; i < 3; i++)
{
for (k = 0; k < 3; k++)
{
temp_2[i][k] = Math.sin(t)*kx[i][k];
}
}
Calculate output
double[][] resOut = new double[3][3];
for (i = 0; i < 3; i++)
{
for (k = 0; k < 3; k++)
{
resOut[i][k] = temp_2[i][k] + temp[i][k] + ((i==k)?1:0);
}
}
return resOut;
}
private static double abs_val_vec (double v[])
{
double output;
output = Math.sqrt(v[0]*v[0] + v[1]*v[1] + v[2]*v[2]);
return output;
}
}
Solution
Any suggestion?
Microbenchmarks only measure the performance of microbenchmarks. Moreover, the only decent way to interpret microbenchmarks is to use micromeasurements. As a result, savvy programmers use tools like Traceview to better understand where their time is being spent.
I suspect that if you run this program through Traceview and look at LogCat, you will find that your time is spent in two ways:
Memory allocation and garbage collection. Your micro-benchmark is chewing ~3MB of heap space. In production code, you’ll never do that, at least if you want to keep your job.
Floating-point arithmetic. Depending on your tablet, you may not have a floating-point coprocessor, and floatology calculations on a CPU without a floating-point coprocessor are very, very slow.
Does it help if I write the code in C and use Android NDK?
Well, it’s hard to answer that question unless you’re dissecting your code under Traceview. For example, if your time is mostly spent on sqrt
(), cos(), and sin(),
that’s already native code and you won’t get any faster.
What’s more, even though this microbenchmark might improve with native code
, all it did was prove that this microbenchmark might improve with native code. For example, C translation of this may be faster due to manual heap management (malloc
() and free())
instead of garbage collection. But this is more of an indictment of how badly microbenchmarks are written than a statement about how much faster C will be, because production Java code will be better optimized than that.
In addition to learning how to use Traceview, I recommend:
Read the NDK documentation because it contains information about when native code makes sense.
Reading Renderscript Compute On some devices, using Renderscript Compute offloads integer math to the GPU, resulting in significant performance gains. This won’t help with your floating-point microbenchmarks, but for other matrix calculations, such as image processing, Renderscript Compute might be well worth looking into.