-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
MemoryError with more than 1E9 rows #8252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can try separately creating Series (with each of the columns first), then putting them into a dict and creating the frame. However you might be having a problem finding contiguous memory. |
finally had time to look at this. I think their was an extra copy going on in certain cases. so try this out using master (once I merge this change). This seems to scale much better. and the following slightly modified code:
|
@mattdowle I updated the example to give a pretty simplied version, that give pretty good memory performance (e.g is just a bit over 1X final data size) by not trying to create everything at once. |
I have 240GB of RAM. Nothing else running on the machine. I'm trying to create 1.5E9 rows, which I think should create a data frame of around 100GB, but getting this MemoryError. This works fine with 1E9 but not 1.5E9. I could understand a limit at about 2^31 (2E9) or 2^32 (4E9) but all 240GB seems exhausted (according to htop) at somewhere between 1E9 and 1.5E9 rows. Any ideas? Thanks.
An earlier question on S.O. is here : http://stackoverflow.com/questions/25631076/is-this-the-fastest-way-to-group-in-pandas
The text was updated successfully, but these errors were encountered: