Series.map should return default dictionary values rather than NaN #15999

dhimmel · 2017-04-14T18:11:49Z

collections.Counter and collections.defaultdict both have default values. However, pandas.Series.map does not respect these defaults and instead returns missing values.

The issue is illustrated below:

import pandas
from collections import Counter, defaultdict
input = pandas.Series(range(5))
counter = Counter()
counter[1] += 1
output = input.map(counter)
expected = series.map(lambda x: counter[x])
pandas.DataFrame({
    'input': input,
    'output': output,
    'expected': expected,
})

Here's the output:

   expected  input  output
0         0      0     NaN
1         1      1     1.0
2         0      2     NaN
3         0      3     NaN
4         0      4     NaN

The workaround is rather easy (lambda x: dictionary[x]) and shouldn't be to hard to implement. Are people on board with the change? Is there a performance concern with looking up each key independently?

The text was updated successfully, but these errors were encountered:

jreback · 2017-04-14T18:16:13Z

why would you do this?

dhimmel · 2017-04-14T18:23:41Z

I've ran into this issue several times with collections.Counter. Most recently see cell 6 of this notebook. With counters, if you haven't observed a key, it defaults to zero (since they're used for counting occurrences).

By using a defaultdict or Counter, the user has chosen that they would like default values. If they don't want defaults, they should just convert or use dict.

jreback · 2017-04-14T18:25:18Z

.map does not accept a Counter, sure its dictlike but not sure why you would actually do this anyhow.

jreback · 2017-04-14T18:25:55Z

looks like you just should do .groupby(...).value_counts() anyhow.

dhimmel · 2017-04-14T18:45:16Z

not sure why you would actually do this anyhow

Because I have a counter of occurrences that I want to add as a column to a dataframe. In many cases the counter cannot be created in pandas using .value_counts(). For example:

the counter is created by iteratively reading a file that won't fit in memory
code must deal with a counter that is returned by another function

Now you could always use series.map(counter).fillna(0).astype(int) but this forces the user to deal with the conversion of ints to float when there's missing data (which is one of the must frustrating aspects of pandas and should be avoided when possible).

.map does not accept a Counter

Map does accept a Counter, since it's a subclass of dict, and provides no warning.

chris-b1 · 2017-04-14T19:18:06Z

Right now we take a fastpath, building an Index out of the dict keys.

pandas/pandas/core/series.py

Line 2137 in 614a48e

arg = self._constructor(arg, index=arg.keys())

So probably either should add a slowpath that respects full semantics if passed a dict subclass, or just raise.

dhimmel · 2017-04-14T19:23:14Z

So probably either should add a slowpath that respects full semantics if passed a dict subclass, or just raise.

What about adding the following to the head of the function?

if isinstance(arg, (collections.Counter, collections.defaultdict)):
    dictionary = arg
    arg = lambda x: dictionary[x]

Note there are other ways of simplifying the function's code I would also explore.

I'm happy to submit a PR and add tests if this is an enhancement that would be accepted.

* series.map: support dicts with defaults closes #15999

dhimmel mentioned this issue Apr 14, 2017

Support dicts with default values in series.map #16002

Merged

jreback added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 14, 2017

jreback modified the milestones: Next Major Release, 0.20.0 Apr 14, 2017

jreback closed this as completed in #16002 Apr 15, 2017

jreback pushed a commit that referenced this issue Apr 15, 2017

Support dicts with default values in series.map (#16002)

61d84db

* series.map: support dicts with defaults closes #15999

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Series.map should return default dictionary values rather than NaN #15999

Series.map should return default dictionary values rather than NaN #15999

dhimmel commented Apr 14, 2017

jreback commented Apr 14, 2017

Uh oh!

dhimmel commented Apr 14, 2017

Uh oh!

jreback commented Apr 14, 2017

Uh oh!

jreback commented Apr 14, 2017

Uh oh!

dhimmel commented Apr 14, 2017 •

edited

Loading

Uh oh!

chris-b1 commented Apr 14, 2017

Uh oh!

dhimmel commented Apr 14, 2017 •

edited

Loading

Uh oh!

Uh oh!

Series.map should return default dictionary values rather than NaN #15999

Series.map should return default dictionary values rather than NaN #15999

Comments

dhimmel commented Apr 14, 2017

jreback commented Apr 14, 2017

Uh oh!

dhimmel commented Apr 14, 2017

Uh oh!

jreback commented Apr 14, 2017

Uh oh!

jreback commented Apr 14, 2017

Uh oh!

dhimmel commented Apr 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-b1 commented Apr 14, 2017

Uh oh!

dhimmel commented Apr 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhimmel commented Apr 14, 2017 •

edited

Loading

dhimmel commented Apr 14, 2017 •

edited

Loading