Skip to content

Pip install of version 0.24 is broken for platforms without wheels #25193

@mgoldey

Description

@mgoldey

Code Sample, a copy-pastable example if possible

Easiest way to reproduce this is in a fresh docker container, as below

docker run --rm -it alpine:latest
$ apk update
$ apk add build-base python3 python3-dev py3-pip
$ pip3 install pandas
Collecting pandas
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    100% |████████████████████████████████| 11.8MB 4.2MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 346, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-y3ff_xpr/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-y3ff_xpr/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1136, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 348, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'

Problem description

Note that pip installing numpy and then pandas does work

pip3 install numpy
pip3 install pandas

But the standalone command is broken.

pip3 install pandas

Unfortunately, this means that a requirements.txt file is insufficient to for setting up a new environment with pandas installed (like in a docker container).

Activity

mgoldey

mgoldey commented on Feb 6, 2019

@mgoldey
Author

I believe this to be a function of recent changes because pandas==0.23.1 does install numpy correctly.

added
BuildLibrary building on various platforms
DependenciesRequired and optional dependencies
on Feb 7, 2019
gfyoung

gfyoung commented on Feb 7, 2019

@gfyoung
Member

cc @TomAugspurger

This is really weird, as doing pip install pandas (without numpy installed) seems to be just fine...

jorisvandenbossche

jorisvandenbossche commented on Feb 7, 2019

@jorisvandenbossche
Member

@gfyoung I suppose you tried with binary wheels?

I can reproduce this with, installing the following in a fresh environment with just python and pip:

(test-pip) joris@joris-XPS-13-9350:~/scipy$ pip install pandas --no-binary :all:
Collecting pandas
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    100% |████████████████████████████████| 11.8MB 1.7MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/home/joris/miniconda3/envs/test-pip/lib/python3.7/site-packages/pkg_resources/__init__.py", line 357, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-3txnb8eh/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-3txnb8eh/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/home/joris/miniconda3/envs/test-pip/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1142, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/home/joris/miniconda3/envs/test-pip/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-3txnb8eh/pandas/

For pandas 0.23.4 it indeed works.

jorisvandenbossche

jorisvandenbossche commented on Feb 7, 2019

@jorisvandenbossche
Member

From a quick look, this might be caused by the changes in #21879 (which I don't fully understand though), as it is caused by maybe_cythonize being ran in setup.py before setuptools is able to check and install dependencies.

cc @jbrockmendel

added this to the 0.24.2 milestone on Feb 7, 2019
changed the title [-]pip installation has broken dependency handling for numpy[/-] [+]Pip install of version 0.24 is broken for platforms without wheels[/+] on Feb 7, 2019
snordhausen

snordhausen commented on Feb 7, 2019

@snordhausen

@jorisvandenbossche You are right, this is the change that broke things. I checked with this Dockerfile:

FROM alpine:3.8
RUN apk update
RUN apk add g++ libstdc++ python3-dev bash git 
RUN pip3 install --upgrade pip

# With the change
#RUN pip3 install numpy==1.16.1 git+https://github.com/pandas-dev/pandas@df2ccef
# 1 commit before:
RUN pip3 install numpy==1.16.1 git+https://github.com/pandas-dev/pandas@f11738b
jbrockmendel

jbrockmendel commented on Feb 7, 2019

@jbrockmendel
Member

maybe_cythonize wraps cython's cythonize, so we may need to get a fix upstream. Shorter-term, I suppose there is something that happens within setuptools.setup that installs numpy (?) that we could shoe-horn into the top of maybe_cythonize

jorisvandenbossche

jorisvandenbossche commented on Feb 7, 2019

@jorisvandenbossche
Member

What I understand from the diff in #21879 is that before, we just passed extensions to the setup(ext_modules=...), and now we already call cythonize (or the maybe wrapper) on them inside setup.py.
So my question then: what is exactly the reason that we started to use cythonize ? Or how do other projects deal with it?

jbrockmendel

jbrockmendel commented on Feb 7, 2019

@jbrockmendel
Member

what is exactly the reason that we started to use cythonize ?

cythonize is recommended way to compile cython modules. A lot of the dependency tracking we used to do manually in setup.py cythonize handles automatically. cythonize also takes care of figuring out which files need to be re-compiled vs are re-usable (I know statsmodels used to do this manually, not sure about pandas)

11 remaining items

jbrockmendel

jbrockmendel commented on Feb 11, 2019

@jbrockmendel
Member

@jorisvandenbossche sure. Is the desired behavior just to check if numpy is installed and if not raise?

jorisvandenbossche

jorisvandenbossche commented on Feb 11, 2019

@jorisvandenbossche
Member

Not sure if it should raise. I think you should be able to call python setup.py .. without having numpy installed. So eg doing a python setup.py egg_info should not call cythonize / require numpy or cython to be installed (as it was before #21879)

jbrockmendel

jbrockmendel commented on Feb 12, 2019

@jbrockmendel
Member

So eg doing a python setup.py egg_info should not call cythonize / require numpy or cython to be installed

OK. The fix that comes to mind is a check inside maybe_cythonize to skip cythonizing based on the command line arguments. At the moment we only skip for clean, but that could be extended to egg_info or whatever else (or the check could be reversed so cythonize is only run for build_ext or whatever)

jorisvandenbossche

jorisvandenbossche commented on Feb 13, 2019

@jorisvandenbossche
Member

I don't know all the possible commands of setuptools enough to know if specifically checking for that would be enough. Best might be to simply test it out.

or the check could be reversed so cythonize is only run for build_ext or whatever

In any case, it seems to me that there is some duplication now in the setup.py, as we have both functionality to cythonize in the build_ext class and in the maybe_cythonize you added. But not familiar enough with it to know how it interacts with the coverage that you were trying to solve.

jbrockmendel

jbrockmendel commented on Feb 13, 2019

@jbrockmendel
Member

But now familiar enough with it to know how it interacts with the coverage that you were trying to solve.

IIRC the coverage implementation isn't quite orthogonal to the usage of cythonize but close. I'm fairly confident that if we had to remove the usage of cythonize, we could do it without removing the coverage implementation.

In any case, it seems to me that there is some duplication now in the setup.py, as we have both functionality to cythonize in the build_ext class and in the maybe_cythonize you added.

Some of that code is shared. maybe_cythonize calls build_ext.render_templates. I guess the code adding numpy_incl to each ext.include_dirs could be shared. Do you see anything else that could be de-duplicated?

I don't know all the possible commands of setuptools enough to know if specifically checking for that would be enough

This is partially an upstream problem with cythonize. I implemented maybe_cythonize to wrap cythonize because python setup.py clean would incorrectly call cythonize if we didn't do a check. Unless you want to revert the use of cythonize (I really hope that's not on the table), I'm pretty sure the place to fix this issue are on line 476 in maybe_cythonize where it currently checks for "clean". We should also push the fix upstream (cython/cython#1495).

I also don't know the setuptools API. It shouldn't be that hard to find a complete-ish list of the possible commands and sort them by whether cythonize should be called or not, should it?

jorisvandenbossche

jorisvandenbossche commented on Feb 15, 2019

@jorisvandenbossche
Member

Unless you want to revert the use of cythonize (I really hope that's not on the table)

Well, I don't care that much how it is done, I mainly care about it being fixed for 0.24.2. I only know that before #21879 it was working, and that I don't have the time to dive into setuptools inner details right now, so if we don't find another solution, reverting it is an option for me.
In any case, the maybe_cythonize is not essential to get a build_ext working. I just removed it, and doing a dev build on master still works fine (but I assume the need for the maybe_cythonized is somehow related to the coverage?).

But the suggestion about checking which setup command is ran, might make sense (at least, eg scipy does things like that: https://github.com/scipy/scipy/blob/master/setup.py)

This is partially an upstream problem with cythonize.

Or you could also say that their recommendation to use cythonize is not a very good one, as mentioned in the last comment on the issue you linked (in their docs, they actually also mention some ways to avoid calling cythonize in certain cases). We already have all the custom build classes (CleanCommand, build_ext, ..) that deal with this, and the cythonize/maybe_cythonize seems to be duplicating that partly (but again, I don't have much knowledge of setuptools/cython interaction)

jbrockmendel

jbrockmendel commented on Feb 15, 2019

@jbrockmendel
Member

But the suggestion about checking which setup command is ran, might make sense

I maintain this is the thing to do. Go for it.

In any case, the maybe_cythonize is not essential to get a build_ext working. I just removed it, and doing a dev build on master still works fine (but I assume the need for the maybe_cythonized is somehow related to the coverage?).

cythonize takes care of figuring out the dependencies that were previously explicitly (and error-proned-ly) listed in setup.py. Using it simplifies that part of setup quite a bit.

jorisvandenbossche

jorisvandenbossche commented on Feb 15, 2019

@jorisvandenbossche
Member

But the suggestion about checking which setup command is ran, might make sense

I maintain this is the thing to do. Go for it.

I currently don't have time to look more into it.

TomAugspurger

TomAugspurger commented on Mar 6, 2019

@TomAugspurger
Contributor

I'll look into this today, as I think this is a blocker for 0.24.2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BuildLibrary building on various platformsDependenciesRequired and optional dependenciesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @xhochy@mgoldey@jorisvandenbossche@TomAugspurger@snordhausen

      Issue actions

        Pip install of version 0.24 is broken for platforms without wheels · Issue #25193 · pandas-dev/pandas