read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

BreitA · 2015-11-27T10:34:00Z

Pandas fails to load an excel file as a dict fo dataframe when the last sheet is empty when sheetname=None.
Deleting the last sheet manually fix the problem.
Maybe Pandas should be improved to be robust to this common case.
Also I post this to raise awareness of this issue in case someone is scratching his head wondering why pandas is not loading his excel file

In [151]:

df_dict=pd.read_excel('D:\trash_test.xlsx',sheetname=None)

print df_dict.keys() # pandas fails to load the file

df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')

print df # pandas loads first sheet normally

df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')

print df # pandas loads second sheet normally

df=pd.read_excel('D:\trash_test.xlsx',sheetname='not_default')

print df # pandas fails to load 'not_default' the last sheet which is empty (to be expected?)

Index([], dtype='object')
Empty DataFrame
Columns: [some_stuff, 1]
Index: []
Empty DataFrame
Columns: [some_stuff, 1]
Index: []

XLRDError Traceback (most recent call last)
in ()
6 df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')
7 print df # pandas loads second sheet normally
----> 8 df=pd.read_excel('D:\trash_test.xlsx',sheetname='not_default')
9 print df # pandas fails to load 'not_default' the last sheet which is empty (to be expected?)

C:\Anaconda\lib\site-packages\pandas\io\excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, *_kwds)
168 date_parser=date_parser, na_values=na_values, thousands=thousands,
169 convert_float=convert_float, has_index_names=has_index_names,
--> 170 skip_footer=skip_footer, converters=converters, *_kwds)
171
172 class ExcelFile(object):

C:\Anaconda\lib\site-packages\pandas\io\excel.pyc in _parse_excel(self, sheetname, header, skiprows, skip_footer, index_col, has_index_names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, verbose, **kwds)
370
371 if isinstance(asheetname, compat.string_types):
--> 372 sheet = self.book.sheet_by_name(asheetname)
373 else: # assume an integer if not a string
374 sheet = self.book.sheet_by_index(asheetname)

C:\Anaconda\lib\site-packages\xlrd\book.pyc in sheet_by_name(self, sheet_name)
439 sheetx = self._sheet_names.index(sheet_name)
440 except ValueError:
--> 441 raise XLRDError('No sheet named <%r>' % sheet_name)
442 return self.sheet_by_index(sheetx)
443

XLRDError: No sheet named <'not_default'>

jreback · 2015-11-27T13:02:38Z

pls show pd.show_versions() and use an example that is created / known to exist with 1 or more sheets (e.g. create it programatically).

sheetname=None returns ALL the sheets in a dict (but IIRC this is after 0.16.0), which should be indicated in the doc-string but is not.

BreitA · 2015-11-27T13:39:55Z

Sorry it's my first time reporting issues on GitHub here is a full example with :

1)- a case where I generate a file with 2 sheets with data and an empty one at the end
2)- a case where I generate a file with 2 sheets with data and NO empty one at the end.
in case 2) the read_excel function works well, but in case 1) it returns an empty dict as I mentionned earlier.
Here is the full code and the print of the output (with pd.show_versions)

import pandas as pd
import numpy as np
from pandas import ExcelWriter

# generate a file with an empty sheet at the end
path='D:\\test.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_end')
writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_end')
print 'df/empty_sheet_at_end:', df # pandas loads the empty sheets that is empty

# generate a file WITHOUT an empty sheet at the end
path='D:\\test2.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
writer.save()
df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df_dict.keys: Index([], dtype='object')

jreback · 2015-11-27T13:43:15Z

df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_end: Empty DataFrame
Columns: []
Index: []
df_dict.keys: [u'sheet1', u'sheet2']
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None

jreback · 2015-11-27T13:44:55Z

can you check if saving an empty frame in-the-middle (e.g. not the end) work?

BreitA · 2015-11-27T14:37:59Z

import pandas as pd
import numpy as np
from pandas import ExcelWriter

print 'empty sheet end'
# generate a file with an empty sheet at the end
path='D:\\test.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_end')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_end')
print 'df/empty_sheet_at_end:', df # pandas loads the empty sheets that is empty

# generate a file with an empty sheet in the middle
print 'empty sheet middle'
path='D:\\testmiddle.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_middle')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_middle')
print 'df/empty_sheet_at_middle:', df # pandas loads the empty sheets that is empty


# generate a file with an empty sheet at the start
print 'empty sheet start'
path='D:\\teststart.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_start')
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_start')
print 'df/empty_sheet_at_start:', df # pandas loads the empty sheets that is empty

print 'without empty sheet'
# generate a file WITHOUT an empty sheet at the end
path='D:\\test2.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
print pd.show_versions()

# This is the Output 

empty sheet end
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_end: Empty DataFrame
Columns: []
Index: []
empty sheet middle
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_middle: Empty DataFrame
Columns: []
Index: []
empty sheet start
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_start: Empty DataFrame
Columns: []
Index: []
without empty sheet
df_dict.keys: [u'sheet1', u'sheet2']
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
None

# Comment 

Ok so it appears the problem touch any excel file with an empty sheet

jreback · 2015-11-29T17:29:49Z

@BreitA pls use triple-back ticks to format the output as this is very difficult to read.

I guess this is a bug, pull-requests to put in a proper test and fix are welcomed.

…as-dev#11711

BUG: read_excel fails when empty sheets exist and sheetname=None #11711

jreback · 2016-01-30T15:38:17Z

closed by #11819

jreback added the IO Excel read_excel, to_excel label Nov 27, 2015

jreback added Bug Compat pandas objects compatability with Numpy or Python functions labels Nov 29, 2015

jreback added this to the Next Major Release milestone Nov 29, 2015

jreback added Difficulty Novice labels Nov 29, 2015

sxwang mentioned this issue Dec 11, 2015

BUG: read_excel fails when empty sheets exist and sheetname=None #11711 #11819

Merged

sxwang added a commit to sxwang/pandas that referenced this issue Dec 11, 2015

BUG: read_excel fails when empty sheets exist and sheetname=None pand…

9df5fb8

…as-dev#11711

jreback modified the milestones: 0.18.0, Next Major Release Dec 11, 2015

jreback added a commit that referenced this issue Dec 12, 2015

Merge pull request #11819 from sxwang/read_excel_empty

4979f7a

BUG: read_excel fails when empty sheets exist and sheetname=None #11711

jreback closed this as completed Jan 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

BreitA commented Nov 27, 2015

jreback commented Nov 27, 2015

Uh oh!

BreitA commented Nov 27, 2015

Uh oh!

jreback commented Nov 27, 2015

Uh oh!

jreback commented Nov 27, 2015

Uh oh!

BreitA commented Nov 27, 2015

Uh oh!

jreback commented Nov 29, 2015

Uh oh!

jreback commented Jan 30, 2016

Uh oh!

Uh oh!

read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

Comments

BreitA commented Nov 27, 2015

jreback commented Nov 27, 2015

Uh oh!

BreitA commented Nov 27, 2015

Uh oh!

jreback commented Nov 27, 2015

Uh oh!

jreback commented Nov 27, 2015

Uh oh!

BreitA commented Nov 27, 2015

Uh oh!

jreback commented Nov 29, 2015

Uh oh!

jreback commented Jan 30, 2016

Uh oh!