Skip to content

date_range() with closed=left and sub-second granularity returns wrong number of elements #24110

Closed
@gabrielreid

Description

@gabrielreid

Code Sample, a copy-pastable example if possible

import pandas as pd
# Good case: a closed=left date_range call should return 'periods' - 1 entries
>>> pd.date_range(start='2018-01-01T00:00:01.000Z', end='2018-01-03T00:00:01.000Z', periods=2, closed='left')
DatetimeIndex(['2018-01-01 00:00:01+00:00'], dtype='datetime64[ns, UTC]', freq=None)

# Bad case: if the start and end have sub-second granularity (in some cases), the 
# returned DatetimeIndex has too many entries (2 instead of 1). The returned dates are also
# not correctly aligned to the start/end dates
>>> pd.date_range(start='2018-01-01T00:00:00.010Z', end='2018-01-03T00:00:00.010Z', periods=2, closed='left')
DatetimeIndex(['2018-01-01 00:00:00.009999872+00:00', '2018-01-03 00:00:00.009999872+00:00'], dtype='datetime64[ns, UTC]', freq=None)

# Unexpected case: this appears to be dependent on the date being used: it doesn't happen with older
# date ranges (using 2001 instead of 2018 as the year "resolves" the problem)
>>> pd.date_range(start='2001-01-01T00:00:00.010Z', end='2001-01-03T00:00:00.010Z', periods=2, closed='left')
DatetimeIndex(['2001-01-01 00:00:00.010000+00:00'], dtype='datetime64[ns, UTC]', freq=None)

Problem description

As far as I understand it, calling date_range with two absolute endpoints, a number of periods, and closed='left' should return a DatetimeIndex with periods - 1 entries. This appears to work as expected in most cases, but supplying more recent dates with sub-second granularity (e.g. 2018-01-01T00:00:00.010Z) appears to trigger an issue which causes periods entries to be contained in the returned DatetimeIndex instead of periods - 1.

I've verified this on a number of older (pre 0.24) versions, as well as in the current HEAD of the master branch, and it appears to be present in all cases.

Expected Output

>>> pd.date_range(start='2018-01-01T00:00:00.010Z', end='2018-01-03T00:00:00.010Z', periods=2, closed='left')
DatetimeIndex(['2018-01-01 00:00:00.010000+00:00'], dtype='datetime64[ns, UTC]', freq=None)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: d7e96d8
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+1208.gd7e96d830
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: 0.11.0
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.11
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions