Skip to content

Version 0.15 MultiIndex forces Datetime.date objects to Timestamp objects #8802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eoincondron opened this issue Nov 13, 2014 · 4 comments
Closed
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions MultiIndex

Comments

@eoincondron
Copy link

I recently updated Pandas and found this strange behaviour which broke some of my existing code.
I was using a column of Datetime.date objects as a the second level in a two-level MulitIndex.
However, when setting the index with the latest version, the Datetime.date objects are converted to Timestamp objects with 00:00:00 as the time component:

pd.version
'0.15.1'
df
0 ID date
0 0.486567 10 2014-11-12
1 0.214374 20 2014-11-13
df.date[0]
datetime.date(2014, 11, 12)
df.set_index(['ID', 'date']).index[0](10, Timestamp%28'2014-11-12 00:00:00'%29)

This doesn't happen with version 0.14 or older.

There is a hack to get around it, setting the dates to a single level index, adding the other level and then swapping:

df.set_index('date').set_index('ID', append=True).index.swaplevel(0, 1)[0](10, datetime.date%282014, 11, 12%29)

This seems strange and I wondered was it intentional.

@jreback
Copy link
Contributor

jreback commented Nov 13, 2014

see #7888 and associated PR.

Their was an inconsistency in how date-likes (datetime.date,datetime.datetime,Timestamp) were inferred in a MultiIndex level. This led to the creation of an object dtyped Index rather than a DatetimeIndex. datetime.date are second class objects in pandas as they are not efficiently represented. Is their a reason you are not using Timestamp/datetime.datetime ?

If you really really want to create this, you can do this:

In [8]: pd.MultiIndex.from_arrays([Index([datetime.date(2013,1,1)]),['a']])
Out[8]: 
MultiIndex(levels=[[2013-01-01], [u'a']],
           labels=[[0], [0]])

@jreback jreback closed this as completed Nov 13, 2014
@jreback jreback added API Design Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions MultiIndex and removed Datetime Datetime data dtype labels Nov 13, 2014
@eoincondron
Copy link
Author

Thanks for the reply. I looked for previous related issues but didn’t find them. Sorry if I’ve wasted your time.
My reason for using datetime.date objects is that I was using them in conjunction with datetime.time in a Mulitindex (3 levels altogether: (ID, date, time)).
It didn’t seem right to have a timestamp with 00:00:00 time component and then a time columns or index level with a different time of day I couldn’t see a way to separate date and time using Pandas objects. Also, doing things like converting the date to a string is a lot messier with the Timestamp if you only want the date component as you have the unwanted time component to deal with.
I’m pretty new to Python and programming in general ( < 6 months) and I made the decision to go about it this way when I was just getting started.
Would appreciate any advice in this regard.

From: jreback [mailto:[email protected]]
Sent: 13 November 2014 12:51
To: pydata/pandas
Cc: eoincondron
Subject: Re: [pandas] Version 0.15 MultiIndex forces Datetime.date objects to Timestamp objects (#8802)

see #7888#7888 and associated PR.

Their was an inconsistency in how date-likes (datetime.date,datetime.datetime,Timestamp) were inferred in a MultiIndex level. This led to the creation of an object dtyped Index rather than a DatetimeIndex. datetime.date are second class objects in pandas as they are not efficiently represented. Is their a reason you are not using Timestamp/datetime.datetime ?

If you really really want to create this, you can do this:

In [8]: pd.MultiIndex.from_arrays([Index([datetime.date(2013,1,1)]),['a']])

Out[8]:

MultiIndex(levels=[[2013-01-01], [u'a']],

       labels=[[0], [0]])


Reply to this email directly or view it on GitHubhttps://github.com//issues/8802#issuecomment-62885623.


IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

@jreback
Copy link
Contributor

jreback commented Nov 13, 2014

their is no need to keep separate date/time components and it makes is quite inefficient to do so.

You can get at the date or time components in a number of ways, e.g. if you are resampling, or you can just index on the times. A more complete example would help me understand what you are trying to do.

@eoincondron
Copy link
Author

One example would be using unstack on the time component to convert a column into a data frame with columns corresponding to the times and index given by the remaining levels. Is it possible to do this directly with a DateTimeIndex?

Also, consider this example using a DateTimeIndex on the second level of a MultiIndex with integers on the first. I'm tryin to locate rows corresponding to a list of index tuples.
Using Timestamps, trying to locate two rows simultaneously doesn't work even though it works using each individual tuple:

In [67]: pairs = [(34142, '20090422'), (34142, '20090423')]

dt_pairs = [(34142, datetime.date(2009, 4, 22)), (34142, datetime.date(2009, 4, 23))]

In [91]: df.loc[pairs]
Out[91]:
price volume time
(34142, 20090422) NaN NaN NaN
(34142, 20090423) NaN NaN NaN

In [93]: df.loc[dt_pairs]
Out[93]:
price volume time
(34142, 2009-04-22) NaN NaN NaN
(34142, 2009-04-23) NaN NaN NaN

In [90]: df.loc[pairs[0]]
Out[90]:
price volume time
tid date
34142 2009-04-22 22.75 31808 08:00:00
2009-04-22 22.88 210247 16:35:00

In [94]: df.loc[dt_pairs[0]]
Out[94]:
price volume time
tid date
34142 2009-04-22 22.75 31808 08:00:00
2009-04-22 22.88 210247 16:35:00

However, It works perfectly fine with datetime.date objects in the index:

In [92]: df2.loc[dt_pairs]
Out[92]:
price volume time
34142 2009-04-22 22.750 31808 08:00:00
2009-04-22 22.880 210247 16:35:00
2009-04-23 23.125 12576 08:00:00
2009-04-23 22.500 248969 16:35:00

I think I will stick to 0.14 for the current project which already has 2000+ lines of code depending on the use of datetime.date objects and try to incorporate Timestamps into future projects.
Thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions MultiIndex
Projects
None yet
Development

No branches or pull requests

2 participants