-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
ENH: general concat with ExtensionArrays through find_common_type #33607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
3464e95
b1d9d68
bb398e7
83fdc91
7f2ac2a
d0f90de
2d5fcb0
a68206b
fc98b65
b072591
91c984a
2a2b9d5
8893165
e19e3ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
import numbers | ||
from typing import TYPE_CHECKING, Tuple, Type, Union | ||
from typing import TYPE_CHECKING, List, Optional, Tuple, Type, Union | ||
import warnings | ||
|
||
import numpy as np | ||
|
||
from pandas._libs import lib, missing as libmissing | ||
from pandas._typing import ArrayLike | ||
from pandas._typing import ArrayLike, DtypeObj | ||
from pandas.compat import set_function_name | ||
from pandas.util._decorators import cache_readonly | ||
|
||
|
@@ -95,6 +95,15 @@ def construct_array_type(cls) -> Type["IntegerArray"]: | |
""" | ||
return IntegerArray | ||
|
||
def _get_common_type(self, dtypes: List[DtypeObj]) -> Optional[DtypeObj]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this a private method on the Dtype? get_common_type (or get_common_dtype) seems fine |
||
# for now only handle other integer types | ||
if not all(isinstance(t, _IntegerDtype) for t in dtypes): | ||
return None | ||
np_dtype = np.find_common_type([t.numpy_dtype for t in dtypes], []) | ||
if np.issubdtype(np_dtype, np.integer): | ||
return _dtypes[str(np_dtype)] | ||
return None | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def __from_arrow__( | ||
self, array: Union["pyarrow.Array", "pyarrow.ChunkedArray"] | ||
) -> "IntegerArray": | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
|
||
import numpy as np | ||
|
||
from pandas._typing import DtypeObj | ||
from pandas.errors import AbstractMethodError | ||
|
||
from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries | ||
|
@@ -322,3 +323,29 @@ def _is_boolean(self) -> bool: | |
bool | ||
""" | ||
return False | ||
|
||
def _get_common_type(self, dtypes: List[DtypeObj]) -> Optional[DtypeObj]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mmm can we keep the return type as Oh... I suppose tz-naive DatetimeArray might break this, since it wants to return a NumPy dtype... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that was my first thought as well. But, right now, eg Categorical can end up with any kind of numpy dtype (depending on the dtype of its categories). As long as not yet all dtypes have a EA version, I don't think it is feasible to require ExtensionDtype here |
||
""" | ||
Return the common dtype, if one exists. | ||
|
||
Used in `find_common_type` implementation. This is for example used | ||
to determine the resulting dtype in a concat operation. | ||
|
||
If no common dtype exists, return None. If all dtypes in the list | ||
will return None, then the common dtype will be "object" dtype. | ||
|
||
Parameters | ||
---------- | ||
dtypes : list of dtypes | ||
The dtypes for which to determine a common dtype. This is a list | ||
of np.dtype or ExtensionDtype instances. | ||
|
||
Returns | ||
------- | ||
Common dtype (np.dtype or ExtensionDtype) or None | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
if len(set(dtypes)) == 1: | ||
# only itself | ||
return self | ||
else: | ||
return None |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
import pytest | ||
|
||
import pandas as pd | ||
import pandas._testing as tm | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"to_concat_dtypes, result_dtype", | ||
[ | ||
(["Int64", "Int64"], "Int64"), | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(["UInt64", "UInt64"], "UInt64"), | ||
(["Int8", "Int8"], "Int8"), | ||
(["Int8", "Int16"], "Int16"), | ||
(["UInt8", "Int8"], "Int16"), | ||
(["Int32", "UInt32"], "Int64"), | ||
# this still gives object (awaiting float extension dtype) | ||
(["Int64", "UInt64"], "object"), | ||
], | ||
) | ||
def test_concat_series(to_concat_dtypes, result_dtype): | ||
|
||
result = pd.concat([pd.Series([1, 2, pd.NA], dtype=t) for t in to_concat_dtypes]) | ||
expected = pd.concat([pd.Series([1, 2, pd.NA], dtype=object)] * 2).astype( | ||
result_dtype | ||
) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be common_type or common_dtype? we've been loose about this distinction so far and i think it has caused amibiguity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care that much. I mainly used "type", because it is meant to be used in
find_common_type
.(that
find_common_type
name is inspired on the numpy function, and that one actually handles both dtypes and scalar types, which I assume is the reason for the name. The pandas version, though, doesn't really make the distinction, so could have been named "find_common_dtype")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to "common_dtype" instead of "common_type". The internal function that uses this is still
find_common_type
, but that name from numpy is actually a misnomer here, since we are only dealing with dtypes, and not scalar types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for indulging me on this nitpick