Skip to content

ENH: inconsistent naming convention for read_excel column selection (#4988) #16488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
52f2c11
implement changes request in PR#16488
abarber4gh May 24, 2017
d681a0e
ENH: inconsistent naming convention for read_csv and read_excel colum…
abarber4gh May 23, 2017
a4341de
no message
abarber4gh May 25, 2017
e985488
change tests keyword from parse_cols to usecol.
abarber4gh May 25, 2017
d58669c
change parse_cols to usecols
abarber4gh May 25, 2017
058177b
removed excess blank line.
abarber4gh May 26, 2017
03593a7
add `deprecate_kwarg` from `_decorators`
abarber4gh May 26, 2017
6649157
TST: ujson tests are not being run (#16499) (#16500)
abarber4gh May 26, 2017
ef487d9
DOC: Remove preference for pytest paradigm in assert_raises_regex (#1…
gfyoung May 27, 2017
e60dc4c
TST: Specify HTML file encoding on PY3 (#16526)
neirbowj May 29, 2017
7efc4e8
BUG: Fixed tput output on windows (#16496)
TomAugspurger May 30, 2017
4ca29f4
BUG: Incorrect handling of rolling.cov with offset window (#16244)
keitakurita May 30, 2017
92d0799
TST: Avoid global state in matplotlib tests (#16539)
TomAugspurger May 31, 2017
fbdae2d
DOC: Update to docstring of DataFrame(dtype) (#14764) (#16487)
VincentLa May 31, 2017
d4f80b0
DOC: correct docstring examples (#3439) (#16432)
ProsperousHeart May 31, 2017
9b0ea41
Fix unbound local with bad engine (#16511)
jtratner May 31, 2017
d31ffdb
return empty MultiIndex for symmetrical difference on equal MultiInde…
Tafkas May 31, 2017
03d44f3
BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (#1…
JosephWagner May 31, 2017
e437ad5
BUG: Bug in .resample() and .groupby() when aggregating on integers (…
jreback May 31, 2017
58f4454
COMPAT: cython str-to-int can raise a ValueError on non-CPython (#16563)
mattip May 31, 2017
ee8346d
CLN: raise correct error for Panel sort_values (#16532)
pepicello May 31, 2017
9d7afa7
BUG: Fixed pd.unique on array of tuples (#16543)
TomAugspurger Jun 1, 2017
a67c7aa
BUG: Allow non-callable attributes in aggregate function. Fixes GH164…
pvomelveny Jun 1, 2017
cab2b6b
Strictly monotonic (#16555)
TomAugspurger Jun 1, 2017
e0a127a
COMPAT: Consider Python 2.x tarfiles file-like (#16533)
gfyoung Jun 1, 2017
e3ee186
BUG: Fixed to_html ignoring index_names parameter
CRP Jun 1, 2017
d419be4
BUG: fixed wrong order of ordered labels in pd.cut()
economy Jun 1, 2017
fb47ee5
fix linting
jreback Jun 1, 2017
7b106e4
TST: writing invalid table names to sqlite (#16464)
Jun 1, 2017
a7760e3
TST: Skip test_database_uri_string if pg8000 importable (#16528)
neirbowj Jun 1, 2017
4ec98d8
DOC: Remove incorrect elements of PeriodIndex docstring (#16553)
tui-rob Jun 1, 2017
a19f9fa
TST: Make HDF5 fspath write test robust (#16575)
TomAugspurger Jun 1, 2017
72e0d1f
ENH: add .ngroup() method to groupby objects (#14026) (#14026)
dsm054 Jun 1, 2017
fc4408b
make null lowercase a missing value (#16534)
OlegShteynbuk Jun 1, 2017
db419bf
MAINT: Drop has_index_names input from read_excel (#16522)
gfyoung Jun 1, 2017
8d092d9
BUG: reimplement MultiIndex.remove_unused_levels (#16565)
rhendric Jun 2, 2017
5f312da
Adding 'n/a' to list of strings denoting missing values (#16079)
chrisgorgo Jun 2, 2017
06f8347
API: Make is_strictly_monotonic_* private (#16576)
TomAugspurger Jun 2, 2017
ff0d1f4
DOC: change doc build to python 3.6 (#16545)
jorisvandenbossche Jun 2, 2017
31e67d5
DOC: whatsnew 0.20.2 edits (#16587)
jreback Jun 2, 2017
9e620bc
DOC: Fix typo in timeseries.rst (#16590)
funnycrab Jun 4, 2017
473615e
PERF: vectorize _interp_limit (#16592)
TomAugspurger Jun 4, 2017
ce3b0c3
DOC: Fix typo in merge doc for validate kwarg (#16595)
benjello Jun 4, 2017
18c316b
BUG: convert numpy strings in index names in HDF #13492 (#16444)
makmanalp Jun 4, 2017
50a62c1
ERRR: Raise error in usecols when column doesn't exist but length mat…
bpraggastis Jun 4, 2017
91057f3
DOC: Whatsnew fixups (#16596)
TomAugspurger Jun 4, 2017
bf99975
DOC: Update release.rst
TomAugspurger Jun 4, 2017
697d026
BUG: pickle compat with UTC tz's (#16611)
jreback Jun 6, 2017
10c17d4
Fix some lgtm alerts (#16613)
jhelie Jun 7, 2017
dfebd8a
BLD: fix numpy on 3.6 build as 1.13 was released but no deps are buil…
jreback Jun 8, 2017
2b44868
BUG: Fix Series.get failure on missing NaN (#8569) (#16619)
dsm054 Jun 8, 2017
722b386
TST: NaN in MultiIndex should not become a string (#7031) (#16625)
dsm054 Jun 8, 2017
73930c5
TST: verify we can add and subtract from indices (#8142) (#16629)
dsm054 Jun 8, 2017
9fdea65
BUG: conversion of Series to Categorical (#16557)
preddy5 Jun 9, 2017
789f7bb
BLD: fix numpy on 2.7 build as 1.13 was released but no deps are buil…
jreback Jun 9, 2017
5aba665
CLN: make license file machine readable (#16649)
tswast Jun 9, 2017
ec6bf6d
fix pytest-xidst version as 1.17 appears buggy (#16652)
jreback Jun 10, 2017
dc716b0
COMPAT: numpy 1.13 test compat (#16654)
jreback Jun 10, 2017
d6c3189
implement changes request in PR#16488
abarber4gh May 24, 2017
5682a05
ENH: inconsistent naming convention for read_csv and read_excel colum…
abarber4gh May 23, 2017
8025c0c
no message
abarber4gh May 25, 2017
f07a002
change tests keyword from parse_cols to usecol.
abarber4gh May 25, 2017
440e6a6
change parse_cols to usecols
abarber4gh May 25, 2017
f299ea2
removed excess blank line.
abarber4gh May 26, 2017
5948c01
add `deprecate_kwarg` from `_decorators`
abarber4gh May 26, 2017
dd7dc30
Merge branch 'issue#4988' of https://github.com/abarber4gh/pandas int…
abarber4gh Jun 10, 2017
a525222
rebase with #16522 changes.
abarber4gh Jun 10, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
BUG: convert numpy strings in index names in HDF #13492 (#16444)
* BUG: Handle numpy strings in index names in HDF5 #13492

* REF: refactor to _ensure_str
  • Loading branch information
makmanalp authored and TomAugspurger committed Jun 4, 2017
commit 18c316b6fba1e00ae60b571304ffd1d0a00fc9a7
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ I/O
- Bug that raised ``IndexError`` when HTML-rendering an empty ``DataFrame`` (:issue:`15953`)
- Bug in :func:`read_csv` in which tarfile object inputs were raising an error in Python 2.x for the C engine (:issue:`16530`)
- Bug where ``DataFrame.to_html()`` ignored the ``index_names`` parameter (:issue:`16493`)
- Bug where ``pd.read_hdf()`` returns numpy strings for index names (:issue:`13492`)

- Bug in ``HDFStore.select_as_multiple()`` where start/stop arguments were not respected (:issue:`16209`)

Expand Down
14 changes: 13 additions & 1 deletion pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,18 @@ def _ensure_encoding(encoding):
return encoding


def _ensure_str(name):
"""Ensure that an index / column name is a str (python 3) or
unicode (python 2); otherwise they may be np.string dtype.
Non-string dtypes are passed through unchanged.

https://github.com/pandas-dev/pandas/issues/13492
"""
if isinstance(name, compat.string_types):
name = compat.text_type(name)
return name


Term = Expr


Expand Down Expand Up @@ -2567,7 +2579,7 @@ def read_index_node(self, node, start=None, stop=None):
name = None

if 'name' in node._v_attrs:
name = node._v_attrs.name
name = _ensure_str(node._v_attrs.name)

index_class = self._alias_to_class(getattr(node._v_attrs,
'index_class', ''))
Expand Down
23 changes: 22 additions & 1 deletion pandas/tests/io/test_pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
date_range, timedelta_range, Index, DatetimeIndex,
isnull)

from pandas.compat import is_platform_windows, PY3, PY35, BytesIO
from pandas.compat import is_platform_windows, PY3, PY35, BytesIO, text_type
from pandas.io.formats.printing import pprint_thing

tables = pytest.importorskip('tables')
Expand Down Expand Up @@ -2920,6 +2920,27 @@ def test_store_index_name_with_tz(self):
recons = store['frame']
tm.assert_frame_equal(recons, df)

@pytest.mark.parametrize('table_format', ['table', 'fixed'])
def test_store_index_name_numpy_str(self, table_format):
# GH #13492
idx = pd.Index(pd.to_datetime([datetime.date(2000, 1, 1),
datetime.date(2000, 1, 2)]),
name=u('cols\u05d2'))
idx1 = pd.Index(pd.to_datetime([datetime.date(2010, 1, 1),
datetime.date(2010, 1, 2)]),
name=u('rows\u05d0'))
df = pd.DataFrame(np.arange(4).reshape(2, 2), columns=idx, index=idx1)

# This used to fail, returning numpy strings instead of python strings.
with ensure_clean_path(self.path) as path:
df.to_hdf(path, 'df', format=table_format)
df2 = read_hdf(path, 'df')

assert_frame_equal(df, df2, check_names=True)

assert type(df2.index.name) == text_type
assert type(df2.columns.name) == text_type

def test_store_series_name(self):
df = tm.makeDataFrame()
series = df['A']
Expand Down