2

I have an input array, which is a masked array.
When I check the mean, I get a nonsensical number: less than the reported minimum value!

So, raw array: numpy.mean(A) < numpy.min(A). Note A.dtype returns float32.

FIX: A3=A.astype(float). A3 is still a masked array, but now the mean lies between the minimum and the maximum, so I have some faith it's correct! Now for some reason A3.dtype is float64. Why?? Why did that change it, and why is it correct at 64 bit and wildly incorrect at 32 bit?

Can anyone shed any light on why I needed to recast the array to accurately calculate the mean? (with or without numpy, it turns out).

EDIT: I'm using a 64-bit system, so yes, that's why recasting changed it to 64bit. It turns out I didn't have this problem if I subsetted the data (extracting from netCDF input using netCDF4 Dataset), smaller arrays did not produce this problem - therefore it's caused by overflow, so switching to 64-bit prevented the problem.
So I'm still not clear on why it would have initially loaded as float32, but I guess it aims to conserve space even if it is a 64-bit system. The array itself is 1872x128x256, with non-masked values around 300, which it turns out is enough to cause overflow :)

5
  • 1
    Please show an actual code example demonstrating the problem.
    – BrenBarn
    Apr 10, 2014 at 6:48
  • numpy arrays are completely different from Python arrays, I assume you mean the former? Apr 10, 2014 at 6:51
  • If you are on a 64-bit system, A.astype(float) will return a np.float64 array.
    – ebarr
    Apr 10, 2014 at 6:52
  • I eventually figured it out, will edit post. Didn't add code or array type because I wanted to keep it generic and not bring NetCDF into it :) Apr 11, 2014 at 1:41
  • 1
    And @ebarr you're right, the fact that the system was 64-bit was the key, it forced it to go to the preferred precision, not the minimum required. Apr 11, 2014 at 1:50

1 Answer 1

0

If you're working with large arrays, be aware of potential overflow problems!!
Changing from 32-bit to 64-bit floats in this instance avoids an (unflagged as far as I can tell) overflow that lead to the anomalous mean calculation.

1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.