Skip to content

BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets #7512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seth-p opened this issue Jun 19, 2014 · 4 comments · Fixed by #7604
Closed
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@seth-p
Copy link
Contributor

seth-p commented Jun 19, 2014

related #7514

There appears to be a bug in the expanding_{cov,corr} functions when dealing with two objects with different indexes.

First, there is a problem with series. See example below, where I would expect expanding_corr(s1, s2) to produce the result produced by expanding_corr(s1, s2a).

The problem is due to the fact that expanding_corr is implemented in terms of rolling_corr with window = max(len(arg1), len(arg2)), but then rolling_corr resets window to window = min(window, len(arg1), len(arg2)). The end result is that window = min(len(arg1), len(arg2)) -- and these are the raw, unaligned arg1 and arg2. Thus in the expanding_corr(s1, s2) example below, window=2, and so when calculating the third row (index=2) it tries to calculate the correlation between [2, 3] and [NaN, 3], producing NaN -- rather than calculating the correlation between [1, 2, 3] and [1, Nan, 3] and producing 1.

The solution would appear to be simply deleting the window = min(window, len(arg1), len(arg2)) line from rolling_cov and rolling_corr, as I believe the rolling_* functions run fine with a window larger than the data, or at least replacing it with window = min(window, max(len(arg1), len(arg2))).

In [1]: from pandas import Series, expanding_corr

In [2]: s1 = Series([1, 2, 3], index=[0, 1, 2])

In [3]: s2 = Series([1, 3], index=[0, 2])

In [4]: expanding_corr(s1, s2)
Out[4]:
0   NaN
1   NaN
2   NaN
dtype: float64

In [5]: s2a = Series([1, None, 3], index=[0, 1, 2])

In [6]: expanding_corr(s1, s2a)
Out[6]:
0   NaN
1   NaN
2     1
dtype: float64

Next, there is a problem with data frames. [This was originally reported separately in https://github.com//issues/7512, but I've merged it into this issue.]

The problem is with with _flex_binary_moment(). When pairwise=True, it doesn't properly handle two DataFrames with different index sets. In the following example, I believe [6], [7], and [8] should all produce the result in [9].

In [1]: from pandas import DataFrame, expanding_corr

In [2]: df1 = DataFrame([[1,2], [3, 2], [3,4]], columns=['A','B'])

In [3]: df1a = DataFrame([[1,2], [3,4]], columns=['A','B'], index=[0,2])

In [4]: df2 = DataFrame([[5,6], [None,None], [2,1]], columns=['X','Y'])

In [5]: df2a = DataFrame([[5,6], [2,1]], columns=['X','Y'], index=[0,2])

In [6]: expanding_corr(df1, df2, pairwise=True)[2]
Out[6]:
          X         Y
A -1.224745 -1.224745
B -1.224745 -1.224745

In [7]: expanding_corr(df1, df2a, pairwise=True)[2]
Out[7]:
    X   Y
A NaN NaN
B NaN NaN

In [8]: expanding_corr(df1a, df2, pairwise=True)[2]
Out[8]:
    X   Y
A NaN NaN
B NaN NaN

In [9]: expanding_corr(df1a, df2a, pairwise=True)[2]
Out[9]:
   X  Y
A -1 -1
B -1 -1

And there are similar problems with rolling_cov and rolling_corr. For example, continuing with the previous example, [77], [78], and [79] should give the same result as [80].

In [77]: rolling_corr(df1, df2, window=3, pairwise=True, min_periods=2)[2]
Out[77]:
          X         Y
A -1.224745 -1.224745
B -1.224745 -1.224745

In [78]: rolling_corr(df1, df2a, window=3, pairwise=True, min_periods=2)[2]
Out[78]:
    X   Y
A NaN NaN
B NaN NaN

In [79]: rolling_corr(df1a, df2, window=3, pairwise=True, min_periods=2)[2]
Out[79]:
    X   Y
A NaN NaN
B NaN NaN

In [80]: rolling_corr(df1a, df2a, window=3, pairwise=True, min_periods=2)[2]
Out[80]:
   X  Y
A -1 -1
B -1 -1
@jreback
Copy link
Contributor

jreback commented Jun 19, 2014

if you'd like to put a pull-request together with some tests would be great.

I think that in _prep_binary the alignment may not be working correctly

e.g.

x, y = s1.align(s2) then de-factor x==s1 and y==s2a. So may not be propogating properly

@jreback jreback added this to the 0.15.0 milestone Jun 19, 2014
@seth-p
Copy link
Contributor Author

seth-p commented Jun 19, 2014

Note that rolling_cov/corr call _flex_binary_moment(), which does align the two arguments (using prep_binary() in the case of two series). The problem is that rolling_cov/corr shrinks the window before the alignment is done. So I think the "right" solution is simply to delete/change the window = min(len(arg1), len(arg2)) line in rolling{cov,corr}, as it seems completely gratuitous (and erroneous) to me.

Afraid I'm not set up two submit a pull request. ("Not set up" = "don't really know how" -- am a bit new to the whole git / github and even Python thing...)

There are other issues with _flex_binary_moment, for which I think I have a solution, but will submit a separate issue for that.

@jreback
Copy link
Contributor

jreback commented Jun 19, 2014

@seth-p
Copy link
Contributor Author

seth-p commented Jun 19, 2014

Will try a bit later, but don't hold your breath...

@seth-p seth-p changed the title BUG: expanding_{cov,corr} functions between objects with different index sets BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets Jun 28, 2014
@jreback jreback modified the milestones: 0.14.1, 0.15.0 Jun 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
2 participants