‘numpy.random.normal’ generates different numbers on different systems
I’m comparing using np.random.normal
The generated numbers use the following code on two different systems (details below) (I’m using the old version np.random.seed
because it is used by another program , I eventually want to verify its output) (1) :
import numpy as np
np.random.seed(0)
x = np.random.normal(scale=1e-3, size=10**5)
np.save('test.npy', x)
Then I copied test.npy
from one system to another and compared the two versions:
>>> other = np.load('test.npy') >>> (x != other).sum(), len(x) (29, 100000) >>> mask = x != other >>> np.abs(x[mask] - other[mask]) array([5.42101086e-20, 1.35525272e-20, 2.71050543e-20, 5.42101086e-20, 1.08420217e-19, 1.08420217e-19, 2.16840434e-19, 2.16840434e-19, 1.35525272e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20, 2.71050543e-20, 1.08420217e-19, 2.16840434e-19, 5.42101086e-20, 2.71050543e-20, 2.16840434e-19, 2.16840434e-19, 2.71050543e-20, 2.71050543e-20, 1.08420217e-19, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20, 1.08420217e-19, 1.08420217e-19, 5.42101086e-20, 2.71050543e-20]) >>> x[mask] array([ 4.52489093e-04, 9.78961454e-05, -1.47113076e-04, -3.67859222e-04, -5.33279620e-04, 8.40794952e-04, -7.75987295e-04, 1.34205479e-03, 6.34459482e-05, 5.07109360e-04, -7.68363366e-04, 3.33350262e-04, -2.19367067e-04, 6.11402140e-04, -1.30486526e-03, -4.42699624e-04, 1.45463287e-04, -1.22491651e-03, 1.05226781e-03, -2.43032730e-04, -2.40551279e-04, 4.95396595e-04, -7.25454745e-04, -8.50779215e-04, -2.66274662e-04, 7.28854386e-04, 8.38515107e-04, 3.36152654e-04, -1.26550328e-04])
So 29 out of 100,000 elements is a small difference. However, I don’t understand where this difference comes from. I confirm that I have the same version of Python and NumPy installed on both systems: python==3.9.4
and numpy==1.20.2 (get python by -m pip install numpy==1.20.2
; But I also checked the latest version
numpy==1.23.0
and the result is exactly the same). I verified that the RNG state (via np.random.get_state())
was the same on both systems before and after calling np.random.normal.
I saved and copied the test.npy
file several times, and I also verified it with an MD5 checksum, so the difference must stem from the random number generation itself (1). However, I don’t see how this is possible, as both are started in the same random state.
System information
System A (the one that holds test.npy
):
$ uname -a
Linux SystemA 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
(I also tested another system, A2, which installed the same OS version as A, but
with a different CPU, but the result did not change from A to A2, i.e. I suspect the OS version).
System B (the system on which test.npy
is loaded):
$ uname -a
Linux SystemB 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Footnote (1): When I use When the recommended approach given in the documentation for np.random.seed
, i.e. rs = RandomState(MT19937(SeedSequence(0)),
I found that the differences between the two systems still exist. However, when I use np.random.default_rng (seed=0)
instead, that is, the new one PCG64
, I noticed that the difference disappeared.
Solution
Given that the difference is so small, this suggests that the underlying bit generator is doing the same thing. This is just related to the differences between the underlying math libraries.
NumPy legacy Generator uses sqrt
and log
in libm
, and you can see that it extracts these symbols by first finding the shared object that provides the generator:
import numpy as np
print(np.random.mtrand.__file__)
Then dump the symbol:
nm -C -gD mtrand.*.so | grep GLIBC
The mtrand
file name comes from the output above.
I get a lot of other symbol output, but that might explain the difference.
I’m guessing this is related to the log implementation, so you can test with the
following approach:
import numpy as np
np.random.seed(0)
x = 2 * np.random.rand(2, 10**5) - 1
r2 = np.sum(x * x, axis=0)
np.save('test-log.npy', np.log(r2))
And compare the two systems.