Friday 27 September 2013

Numpy Load OverflowError: length too large

Numpy Load OverflowError: length too large

I have an algorithm that runs through a dataset and creates a scipy sparse
matrix which in turn is saved using:
numpy.savez
and the file is open such as:
open(file, 'wb').
The matrix can get a considerable amount of disk space (it took about 20
GB running for 30 days)
After that, those matrices are loaded into other applications such as:
file = open(path_to_file, 'rb')
matrix = load(file)
data = matrix['arr_0']
ind = matrix['arr_1']
indptr = matrix['arr_2']
For 10 days it worked fine.
When running for a dataset of 30 days the matrix was also successfully
created and saved.
But when trying to load it I got the error:
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/ubuntu/recsys/Scripts/Neighborhood/s3_CRM_neighborhood.py",
line 76, in <module>
data = matrix['arr_0']
File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 241, in
__getitem__
return format.read_array(value)
File "/usr/lib/python2.7/dist-packages/numpy/lib/format.py", line 458,
in read_array
data = fp.read(int(count * dtype.itemsize))
OverflowError: length too large
If I could successfully create and save the matrices shouldn't it be able
to also load the result? Is there some overhead that is killing the
loading? Is is possible to work around this issue?
Thanks in advance,

No comments:

Post a Comment