-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Closed
Copy link
Labels
BugDatetimeDatetime data dtypeDatetime data dtypeGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone
Description
NaT is included in result of groupby method first while NaN. I am expecting that first should skip both NaN and NaT and include the first value where pandas.isnull is False.
Demonstration of the inconsistency. (note that both NaT and NaN in the data frame are produced by np.nan, the difference is that the d_t column contains date values).
import numpy as np
import pandas as pd
from datetime import datetime as dt
testFrame=DataFrame({'IX':['A','A'],'num':[np.nan,100],'d_t':[np.nan,dt.now()]})Resulting data frame:
IX d_t num
0 A NaT NaN
1 A 2015-07-15 22:47:10.635 100
Grouping this data frame on the IX column and executing the first method results in this data frame which shows the inconsistency between the d_t and num columns.
testFrame.groupby('IX').first()
Resulting dataframe:
d_t num
IX
A NaT 100
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.6.7
lxml: 3.4.2
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None
Metadata
Metadata
Assignees
Labels
BugDatetimeDatetime data dtypeDatetime data dtypeGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate