-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: pd.read_json May Not Maintain Numeric String Index #38727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.read_json May Not Maintain Numeric String Index #38727
Conversation
arw2019
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @theoniko for the PR!
Some comments. We'll also need a whatsnew for this (1.3)
pandas/io/json/_json.py
Outdated
| and obj._get_axis(axis_name).dtype == "object" | ||
| and obj.index.inferred_type == "string" | ||
| ): | ||
| index_list = list(obj._get_axis(axis_name).values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to convert to a list here? We should be able to inspect values directly I think
pandas/io/json/_json.py
Outdated
| and obj.index.inferred_type == "string" | ||
| ): | ||
| index_list = list(obj._get_axis(axis_name).values) | ||
| are_all_strings = all(type(element) == str for element in index_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have helpers got these kinds of checks in pandas/core/dtypes/common.py, may as well use those
pandas/io/json/_json.py
Outdated
| are_all_int_or_floats = all( | ||
| are_int_or_floats is True for element in are_int_or_floats | ||
| ) | ||
| if are_all_int_or_floats and are_all_strings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ever hit? only one of those can be true at any time, no?
| result = True | ||
| except (TypeError, ValueError): | ||
| pass | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls revert this to minimize the diff
pandas/tests/io/json/test_pandas.py
Outdated
| # JSON objects. JSON keys are by definition strings, so there's no way | ||
| # to disambiguate whether those keys actually were strings or numeric | ||
| # beforehand and numeric wins out. | ||
| # TODO: Split should be able to support this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we delete this TODO now?
pandas/io/json/_json.py
Outdated
| obj = self.obj | ||
| assert obj is not None # for mypy | ||
| for axis_name in obj._AXIS_ORDERS: | ||
| if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if condition seems too specific. Is there a more generic way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as i understood the minimum preconditions to keep string index should in read_json orient equal to split or index and the index of json format to be a list of int or float strings. Am i right? Should i move the checks in _try_convert_data?
Do you have a more conrete and robust suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a slightly different way. Could you please re-review?
dd46945 to
42f912e
Compare
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be less specialized (as indicated by comments)
28b31fb to
bb21639
Compare
01488e4 to
bfd8e89
Compare
pandas/tests/io/json/test_pandas.py
Outdated
| expected.columns = expected.columns.astype(np.int64) | ||
| elif convert_axes and orient == "split": | ||
| expected.columns = expected.columns.astype(np.int64) | ||
| expected.index = expected.index.astype(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this conversion? I think the problem is that we are converting when we don't need to
bfd8e89 to
db10b73
Compare
|
All requested changes were made. Please re-review the pr. |
doc/source/whatsnew/v1.3.0.rst
Outdated
| for subclasses of ``DataFrame`` or ``Series`` (:issue:`33748`). | ||
| - Bug in :func:`json_normalize` resulting in the first element of a generator object not being included in the returned ``DataFrame`` (:issue:`35923`) | ||
|
|
||
| - Bug in :func:`read_json` does not maintan numeric string index (:issue:`28556`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not specific enough to the actual case, pls update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/io/json/_json.py
Outdated
| and self.orient == "split" | ||
| and len(data) | ||
| ): | ||
| result = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using result at all, just return on a conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the condition to the end and removed the result variable.
pandas/io/json/_json.py
Outdated
| if data.dtype == "object": | ||
|
|
||
| if ( | ||
| isinstance(data, Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need to test if an Index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
merge master as well |
3ba5e89 to
0762111
Compare
pandas/io/json/_json.py
Outdated
| pass | ||
|
|
||
| return data, result | ||
| if name == "index" and self.orient == "split" and len(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks a lot better. can you add a comment here and structure this more like
# if we have an index, we want to preserve dtypes
if name == "index" and len(data):
if self.orient == "split":
return data, False
return data, True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9234406 to
2e5e743
Compare
doc/source/whatsnew/v1.3.0.rst
Outdated
| - Bug in :func:`json_normalize` resulting in the first element of a generator object not being included in the returned ``DataFrame`` (:issue:`35923`) | ||
| - Bug in :func:`read_excel` forward filling :class:`MultiIndex` names with multiple header and index columns specified (:issue:`34673`) | ||
| - :func:`pandas.read_excel` now respects :func:``pandas.set_option`` (:issue:`34252`) | ||
| - Bug in :func:`read_json` when ``orient="split"`` does not maintan numeric string index (:issue:`28556`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in :func:read_json when orient="split" does not maintain numeric string index (:issue:28556)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
2e5e743 to
f2a2785
Compare
|
lgtm pushed a small doc change and can merge on green |
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diff