Skip to content

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
52f2c11
implement changes request in PR#16488
abarber4gh May 24, 2017
d681a0e
ENH: inconsistent naming convention for read_csv and read_excel colum…
abarber4gh May 23, 2017
a4341de
no message
abarber4gh May 25, 2017
e985488
change tests keyword from parse_cols to usecol.
abarber4gh May 25, 2017
d58669c
change parse_cols to usecols
abarber4gh May 25, 2017
058177b
removed excess blank line.
abarber4gh May 26, 2017
03593a7
add `deprecate_kwarg` from `_decorators`
abarber4gh May 26, 2017
6649157
TST: ujson tests are not being run (#16499) (#16500)
abarber4gh May 26, 2017
ef487d9
DOC: Remove preference for pytest paradigm in assert_raises_regex (#1…
gfyoung May 27, 2017
e60dc4c
TST: Specify HTML file encoding on PY3 (#16526)
neirbowj May 29, 2017
7efc4e8
BUG: Fixed tput output on windows (#16496)
TomAugspurger May 30, 2017
4ca29f4
BUG: Incorrect handling of rolling.cov with offset window (#16244)
keitakurita May 30, 2017
92d0799
TST: Avoid global state in matplotlib tests (#16539)
TomAugspurger May 31, 2017
fbdae2d
DOC: Update to docstring of DataFrame(dtype) (#14764) (#16487)
VincentLa May 31, 2017
d4f80b0
DOC: correct docstring examples (#3439) (#16432)
ProsperousHeart May 31, 2017
9b0ea41
Fix unbound local with bad engine (#16511)
jtratner May 31, 2017
d31ffdb
return empty MultiIndex for symmetrical difference on equal MultiInde…
Tafkas May 31, 2017
03d44f3
BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (#1…
JosephWagner May 31, 2017
e437ad5
BUG: Bug in .resample() and .groupby() when aggregating on integers (…
jreback May 31, 2017
58f4454
COMPAT: cython str-to-int can raise a ValueError on non-CPython (#16563)
mattip May 31, 2017
ee8346d
CLN: raise correct error for Panel sort_values (#16532)
pepicello May 31, 2017
9d7afa7
BUG: Fixed pd.unique on array of tuples (#16543)
TomAugspurger Jun 1, 2017
a67c7aa
BUG: Allow non-callable attributes in aggregate function. Fixes GH164…
pvomelveny Jun 1, 2017
cab2b6b
Strictly monotonic (#16555)
TomAugspurger Jun 1, 2017
e0a127a
COMPAT: Consider Python 2.x tarfiles file-like (#16533)
gfyoung Jun 1, 2017
e3ee186
BUG: Fixed to_html ignoring index_names parameter
CRP Jun 1, 2017
d419be4
BUG: fixed wrong order of ordered labels in pd.cut()
economy Jun 1, 2017
fb47ee5
fix linting
jreback Jun 1, 2017
7b106e4
TST: writing invalid table names to sqlite (#16464)
Jun 1, 2017
a7760e3
TST: Skip test_database_uri_string if pg8000 importable (#16528)
neirbowj Jun 1, 2017
4ec98d8
DOC: Remove incorrect elements of PeriodIndex docstring (#16553)
tui-rob Jun 1, 2017
a19f9fa
TST: Make HDF5 fspath write test robust (#16575)
TomAugspurger Jun 1, 2017
72e0d1f
ENH: add .ngroup() method to groupby objects (#14026) (#14026)
dsm054 Jun 1, 2017
fc4408b
make null lowercase a missing value (#16534)
OlegShteynbuk Jun 1, 2017
db419bf
MAINT: Drop has_index_names input from read_excel (#16522)
gfyoung Jun 1, 2017
8d092d9
BUG: reimplement MultiIndex.remove_unused_levels (#16565)
rhendric Jun 2, 2017
5f312da
Adding 'n/a' to list of strings denoting missing values (#16079)
chrisgorgo Jun 2, 2017
06f8347
API: Make is_strictly_monotonic_* private (#16576)
TomAugspurger Jun 2, 2017
ff0d1f4
DOC: change doc build to python 3.6 (#16545)
jorisvandenbossche Jun 2, 2017
31e67d5
DOC: whatsnew 0.20.2 edits (#16587)
jreback Jun 2, 2017
9e620bc
DOC: Fix typo in timeseries.rst (#16590)
funnycrab Jun 4, 2017
473615e
PERF: vectorize _interp_limit (#16592)
TomAugspurger Jun 4, 2017
ce3b0c3
DOC: Fix typo in merge doc for validate kwarg (#16595)
benjello Jun 4, 2017
18c316b
BUG: convert numpy strings in index names in HDF #13492 (#16444)
makmanalp Jun 4, 2017
50a62c1
ERRR: Raise error in usecols when column doesn't exist but length mat…
bpraggastis Jun 4, 2017
91057f3
DOC: Whatsnew fixups (#16596)
TomAugspurger Jun 4, 2017
bf99975
DOC: Update release.rst
TomAugspurger Jun 4, 2017
697d026
BUG: pickle compat with UTC tz's (#16611)
jreback Jun 6, 2017
10c17d4
Fix some lgtm alerts (#16613)
jhelie Jun 7, 2017
dfebd8a
BLD: fix numpy on 3.6 build as 1.13 was released but no deps are buil…
jreback Jun 8, 2017
2b44868
BUG: Fix Series.get failure on missing NaN (#8569) (#16619)
dsm054 Jun 8, 2017
722b386
TST: NaN in MultiIndex should not become a string (#7031) (#16625)
dsm054 Jun 8, 2017
73930c5
TST: verify we can add and subtract from indices (#8142) (#16629)
dsm054 Jun 8, 2017
9fdea65
BUG: conversion of Series to Categorical (#16557)
preddy5 Jun 9, 2017
789f7bb
BLD: fix numpy on 2.7 build as 1.13 was released but no deps are buil…
jreback Jun 9, 2017
5aba665
CLN: make license file machine readable (#16649)
tswast Jun 9, 2017
ec6bf6d
fix pytest-xidst version as 1.17 appears buggy (#16652)
jreback Jun 10, 2017
dc716b0
COMPAT: numpy 1.13 test compat (#16654)
jreback Jun 10, 2017
d6c3189
implement changes request in PR#16488
abarber4gh May 24, 2017
5682a05
ENH: inconsistent naming convention for read_csv and read_excel colum…
abarber4gh May 23, 2017
8025c0c
no message
abarber4gh May 25, 2017
f07a002
change tests keyword from parse_cols to usecol.
abarber4gh May 25, 2017
440e6a6
change parse_cols to usecols
abarber4gh May 25, 2017
f299ea2
removed excess blank line.
abarber4gh May 26, 2017
5948c01
add `deprecate_kwarg` from `_decorators`
abarber4gh May 26, 2017
dd7dc30
Merge branch 'issue#4988' of https://github.com/abarber4gh/pandas int…
abarber4gh Jun 10, 2017
a525222
rebase with #16522 changes.
abarber4gh Jun 10, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
implement changes request in PR#16488
- removed usecols mention in Other Enhancments section,
remains in Deprecations.
- removed test_parse_* test methods in favor of test_usecols_* methods.
- changed parse_cols to usecols in test_read_one_empty_col_* instead
of catching warning.
  • Loading branch information
abarber4gh committed Jun 10, 2017
commit d6c318954b2cc2bfe25bf1c98e4c11c2e4bbda3f
2 changes: 1 addition & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2754,7 +2754,7 @@ to be parsed.

read_excel('path_to_file.xls', 'Sheet1', parse_cols=2)

If `parse_cols` is a list of integers, then it is assumed to be the file column
If `usecols` is a list of integers, characters, or both, then it is assumed to be the file column
indices to be parsed.

.. code-block:: python
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Other API Changes
Deprecations
~~~~~~~~~~~~
- :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with to_excel() (:issue:`10559`).
- :func:`read_excel()` has deprecated ``parse_cols`` in favor of ``usecols`` for consistency with other read_* functions (:issue:`4988`).


.. _whatsnew_0210.prior_deprecations:
Expand Down
32 changes: 17 additions & 15 deletions pandas/tests/io/test_excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,36 +166,38 @@ def setup_method(self, method):
self.check_skip()
super(ReadingTestsBase, self).setup_method(method)

def test_parse_cols_int(self):

def test_usecols_int(self):
# GH4988: inconsistent naming convention for read_excel column select
dfref = self.get_csv_refdf('test1')
dfref = dfref.reindex(columns=['A', 'B', 'C'])
df1 = self.get_exceldf('test1', 'Sheet1', index_col=0, parse_cols=3)
df1 = self.get_exceldf('test1', 'Sheet1', index_col=0, usecols=3)
df2 = self.get_exceldf('test1', 'Sheet2', skiprows=[1], index_col=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you need to change ALL tests to use the new one (usecols), except for a single test to actually hit the deprecation.

parse_cols=3)
# TODO add index to xls file)
tm.assert_frame_equal(df1, dfref, check_names=False)
tm.assert_frame_equal(df2, dfref, check_names=False)

def test_parse_cols_list(self):

def test_usecols_list(self):
# GH4988: inconsistent naming convention for read_excel column select
dfref = self.get_csv_refdf('test1')
dfref = dfref.reindex(columns=['B', 'C'])
df1 = self.get_exceldf('test1', 'Sheet1', index_col=0,
parse_cols=[0, 2, 3])
df2 = self.get_exceldf('test1', 'Sheet2', skiprows=[1], index_col=0,
parse_cols=[0, 2, 3])
# TODO add index to xls file)
usecols=[0, 2, 3])
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
df3 = self.get_exceldf('test1', 'Sheet1', index_col=0,
parse_cols=[0, 2, 3])

tm.assert_frame_equal(df1, dfref, check_names=False)
tm.assert_frame_equal(df2, dfref, check_names=False)

def test_parse_cols_str(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave the original tests structure (sure you can change the name to conform), but don't change the tests (in THIS PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are back to original but with changed function & kwarg names.


def test_usecols_str(self):
# GH4988: inconsistent naming convention for read_excel column select
dfref = self.get_csv_refdf('test1')

df1 = dfref.reindex(columns=['A', 'B', 'C'])
df2 = self.get_exceldf('test1', 'Sheet1', index_col=0,
parse_cols='A:D')
df2 = self.get_exceldf('test1', 'Sheet1', usecol=0, usecols='A:D')
df3 = self.get_exceldf('test1', 'Sheet2', skiprows=[1], index_col=0,
parse_cols='A:D')
# TODO add index to xls, read xls ignores index name ?
Expand Down Expand Up @@ -465,14 +467,14 @@ def test_read_one_empty_col_no_header(self):
actual_header_none = read_excel(
path,
'no_header',
parse_cols=[0],
usecols=[0],
header=None
)

actual_header_zero = read_excel(
path,
'no_header',
parse_cols=[0],
usecols=[0],
header=0
)
expected = DataFrame()
Expand All @@ -494,14 +496,14 @@ def test_read_one_empty_col_with_header(self):
actual_header_none = read_excel(
path,
'with_header',
parse_cols=[0],
usecols=[0],
header=None
)

actual_header_zero = read_excel(
path,
'with_header',
parse_cols=[0],
usecols=[0],
header=0
)
expected_header_none = DataFrame(pd.Series([0], dtype='int64'))
Expand Down