You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There appears to be a bug in the expanding_{cov,corr} functions when dealing with two objects with different indexes.
First, there is a problem with series. See example below, where I would expect expanding_corr(s1, s2) to produce the result produced by expanding_corr(s1, s2a).
The problem is due to the fact that expanding_corr is implemented in terms of rolling_corr with window = max(len(arg1), len(arg2)), but then rolling_corr resets window to window = min(window, len(arg1), len(arg2)). The end result is that window = min(len(arg1), len(arg2)) -- and these are the raw, unaligned arg1 and arg2. Thus in the expanding_corr(s1, s2) example below, window=2, and so when calculating the third row (index=2) it tries to calculate the correlation between [2, 3] and [NaN, 3], producing NaN -- rather than calculating the correlation between [1, 2, 3] and [1, Nan, 3] and producing 1.
The solution would appear to be simply deleting the window = min(window, len(arg1), len(arg2)) line from rolling_cov and rolling_corr, as I believe the rolling_* functions run fine with a window larger than the data, or at least replacing it with window = min(window, max(len(arg1), len(arg2))).
In [1]: from pandas import Series, expanding_corr
In [2]: s1 = Series([1, 2, 3], index=[0, 1, 2])
In [3]: s2 = Series([1, 3], index=[0, 2])
In [4]: expanding_corr(s1, s2)
Out[4]:
0 NaN
1 NaN
2 NaN
dtype: float64
In [5]: s2a = Series([1, None, 3], index=[0, 1, 2])
In [6]: expanding_corr(s1, s2a)
Out[6]:
0 NaN
1 NaN
2 1
dtype: float64
Next, there is a problem with data frames. [This was originally reported separately in https://github.com//issues/7512, but I've merged it into this issue.]
The problem is with with _flex_binary_moment(). When pairwise=True, it doesn't properly handle two DataFrames with different index sets. In the following example, I believe [6], [7], and [8] should all produce the result in [9].
In [1]: from pandas import DataFrame, expanding_corr
In [2]: df1 = DataFrame([[1,2], [3, 2], [3,4]], columns=['A','B'])
In [3]: df1a = DataFrame([[1,2], [3,4]], columns=['A','B'], index=[0,2])
In [4]: df2 = DataFrame([[5,6], [None,None], [2,1]], columns=['X','Y'])
In [5]: df2a = DataFrame([[5,6], [2,1]], columns=['X','Y'], index=[0,2])
In [6]: expanding_corr(df1, df2, pairwise=True)[2]
Out[6]:
X Y
A -1.224745 -1.224745
B -1.224745 -1.224745
In [7]: expanding_corr(df1, df2a, pairwise=True)[2]
Out[7]:
X Y
A NaN NaN
B NaN NaN
In [8]: expanding_corr(df1a, df2, pairwise=True)[2]
Out[8]:
X Y
A NaN NaN
B NaN NaN
In [9]: expanding_corr(df1a, df2a, pairwise=True)[2]
Out[9]:
X Y
A -1 -1
B -1 -1
And there are similar problems with rolling_cov and rolling_corr. For example, continuing with the previous example, [77], [78], and [79] should give the same result as [80].
In [77]: rolling_corr(df1, df2, window=3, pairwise=True, min_periods=2)[2]
Out[77]:
X Y
A -1.224745 -1.224745
B -1.224745 -1.224745
In [78]: rolling_corr(df1, df2a, window=3, pairwise=True, min_periods=2)[2]
Out[78]:
X Y
A NaN NaN
B NaN NaN
In [79]: rolling_corr(df1a, df2, window=3, pairwise=True, min_periods=2)[2]
Out[79]:
X Y
A NaN NaN
B NaN NaN
In [80]: rolling_corr(df1a, df2a, window=3, pairwise=True, min_periods=2)[2]
Out[80]:
X Y
A -1 -1
B -1 -1
The text was updated successfully, but these errors were encountered:
Note that rolling_cov/corr call _flex_binary_moment(), which does align the two arguments (using prep_binary() in the case of two series). The problem is that rolling_cov/corr shrinks the window before the alignment is done. So I think the "right" solution is simply to delete/change the window = min(len(arg1), len(arg2)) line in rolling{cov,corr}, as it seems completely gratuitous (and erroneous) to me.
Afraid I'm not set up two submit a pull request. ("Not set up" = "don't really know how" -- am a bit new to the whole git / github and even Python thing...)
There are other issues with _flex_binary_moment, for which I think I have a solution, but will submit a separate issue for that.
seth-p
changed the title
BUG: expanding_{cov,corr} functions between objects with different index sets
BUG: {expanding,rolling}_{cov,corr} functions between objects with different index sets
Jun 28, 2014
related #7514
There appears to be a bug in the expanding_{cov,corr} functions when dealing with two objects with different indexes.
First, there is a problem with series. See example below, where I would expect expanding_corr(s1, s2) to produce the result produced by expanding_corr(s1, s2a).
The problem is due to the fact that expanding_corr is implemented in terms of rolling_corr with window = max(len(arg1), len(arg2)), but then rolling_corr resets window to window = min(window, len(arg1), len(arg2)). The end result is that window = min(len(arg1), len(arg2)) -- and these are the raw, unaligned arg1 and arg2. Thus in the expanding_corr(s1, s2) example below, window=2, and so when calculating the third row (index=2) it tries to calculate the correlation between [2, 3] and [NaN, 3], producing NaN -- rather than calculating the correlation between [1, 2, 3] and [1, Nan, 3] and producing 1.
The solution would appear to be simply deleting the window = min(window, len(arg1), len(arg2)) line from rolling_cov and rolling_corr, as I believe the rolling_* functions run fine with a window larger than the data, or at least replacing it with window = min(window, max(len(arg1), len(arg2))).
Next, there is a problem with data frames. [This was originally reported separately in https://github.com//issues/7512, but I've merged it into this issue.]
The problem is with with _flex_binary_moment(). When pairwise=True, it doesn't properly handle two DataFrames with different index sets. In the following example, I believe [6], [7], and [8] should all produce the result in [9].
And there are similar problems with rolling_cov and rolling_corr. For example, continuing with the previous example, [77], [78], and [79] should give the same result as [80].
The text was updated successfully, but these errors were encountered: