Skip to content

PERF: MultiIndex set and indexing operations #53955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 10, 2023

Conversation

lukemanley
Copy link
Member

Avoids "densifying" the levels of a MultiIndex for various set and indexing ops (MultiIndex.get_indexer_for).

> asv continuous -f 1.1 upstream/main mi-set-ops -b ^multiindex_object

       before           after         ratio
     [6eb59b32]       [30e990fd]
     <main>           <mi-set-ops>
-      20.1±0.5ms         17.5±1ms     0.87  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'symmetric_difference', False)
-        57.3±4ms       49.3±0.9ms     0.86  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'intersection', None)
-        20.5±1ms       17.6±0.6ms     0.86  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'symmetric_difference', None)
-        19.1±2ms       16.4±0.4ms     0.86  multiindex_object.SetOperations.time_operation('non_monotonic', 'ea_int', 'symmetric_difference', False)
-        19.3±2ms      16.5±0.05ms     0.85  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'symmetric_difference', False)
-        20.4±1ms       17.1±0.5ms     0.84  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'symmetric_difference', None)
-        21.9±1ms       18.4±0.5ms     0.84  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'intersection', False)
-      48.0±0.4ms       39.8±0.5ms     0.83  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'union', None)
-        21.0±1ms       17.4±0.4ms     0.83  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'symmetric_difference', None)
-        50.8±3ms         41.8±1ms     0.82  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'union', None)
-        21.6±1ms       17.2±0.5ms     0.80  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'intersection', False)
-        21.9±1ms       17.2±0.2ms     0.79  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'intersection', False)
-      22.0±0.7ms       17.2±0.2ms     0.78  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'symmetric_difference', None)
-        28.3±3ms       22.1±0.3ms     0.78  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union', None)
-      14.5±0.9ms       11.1±0.2ms     0.77  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'union', False)
-        21.9±2ms       16.5±0.9ms     0.75  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'symmetric_difference', False)
-        15.7±2ms       11.0±0.4ms     0.70  multiindex_object.SetOperations.time_operation('non_monotonic', 'ea_int', 'union', False)
-        15.9±1ms       11.1±0.1ms     0.69  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'union', False)
-        17.4±1ms       10.9±0.2ms     0.62  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'union', False)
-        30.1±3ms       17.5±0.4ms     0.58  multiindex_object.SetOperations.time_operation('non_monotonic', 'ea_int', 'symmetric_difference', None)
-      31.2±0.4ms       17.8±0.5ms     0.57  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'intersection', False)
-        30.4±1ms         17.3±1ms     0.57  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'symmetric_difference', None)
-      6.99±0.3ms       3.96±0.1ms     0.57  multiindex_object.Isin.time_isin_small('int')
-        11.6±1ms       6.48±0.3ms     0.56  multiindex_object.Isin.time_isin_large('int')
-        32.3±3ms       17.6±0.8ms     0.55  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'symmetric_difference', None)
-      7.81±0.6ms       4.24±0.8ms     0.54  multiindex_object.Isin.time_isin_small('datetime')
-        22.5±3ms       12.0±0.7ms     0.53  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union', False)
-        31.0±3ms       15.6±0.2ms     0.50  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'symmetric_difference', False)
-        34.0±2ms       17.0±0.7ms     0.50  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'intersection', False)
-        31.5±2ms       15.6±0.2ms     0.50  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'symmetric_difference', False)
-        34.0±2ms      16.4±0.09ms     0.48  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'symmetric_difference', None)
-      11.9±0.6ms       5.57±0.2ms     0.47  multiindex_object.Isin.time_isin_large('datetime')
-        24.0±2ms       11.0±0.4ms     0.46  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'union', False)
-      36.1±0.8ms       15.8±0.7ms     0.44  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'intersection', False)
-      24.4±0.3ms       10.6±0.3ms     0.44  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'union', False)
-      17.1±0.8ms       5.45±0.7ms     0.32  multiindex_object.Isin.time_isin_large('string')
-      12.8±0.8ms      3.59±0.03ms     0.28  multiindex_object.Isin.time_isin_small('string')

@lukemanley lukemanley added Performance Memory or execution speed performance MultiIndex labels Jul 1, 2023
@lukemanley lukemanley requested a review from WillAyd as a code owner July 1, 2023 00:18
@lukemanley lukemanley added this to the 2.1 milestone Jul 1, 2023
@@ -2448,6 +2448,19 @@ def reorder_levels(self, order) -> MultiIndex:
levels=new_levels, codes=new_codes, names=new_names, verify_integrity=False
)

def _recode_for_new_levels(self, new_levels, copy: bool = True) -> list[np.ndarray]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could this be a generator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure - updated.

@mroeschke mroeschke merged commit ac3153b into pandas-dev:main Jul 10, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the mi-set-ops branch July 12, 2023 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants