Skip to content

read_excel fails to read excel file when last sheet is empty and sheetname=None #11711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BreitA opened this issue Nov 27, 2015 · 7 comments
Closed
Labels
Bug Compat pandas objects compatability with Numpy or Python functions IO Excel read_excel, to_excel
Milestone

Comments

@BreitA
Copy link

BreitA commented Nov 27, 2015

Pandas fails to load an excel file as a dict fo dataframe when the last sheet is empty when sheetname=None.
Deleting the last sheet manually fix the problem.
Maybe Pandas should be improved to be robust to this common case.
Also I post this to raise awareness of this issue in case someone is scratching his head wondering why pandas is not loading his excel file

In [151]:

df_dict=pd.read_excel('D:\trash_test.xlsx',sheetname=None)

print df_dict.keys() # pandas fails to load the file

df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')

print df # pandas loads first sheet normally

df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')

print df # pandas loads second sheet normally

df=pd.read_excel('D:\trash_test.xlsx',sheetname='not_default')

print df # pandas fails to load 'not_default' the last sheet which is empty (to be expected?)

Index([], dtype='object')
Empty DataFrame
Columns: [some_stuff, 1]
Index: []
Empty DataFrame
Columns: [some_stuff, 1]
Index: []


XLRDError Traceback (most recent call last)
in ()
6 df=pd.read_excel('D:\trash_test.xlsx',sheetname='a_test')
7 print df # pandas loads second sheet normally
----> 8 df=pd.read_excel('D:\trash_test.xlsx',sheetname='not_default')
9 print df # pandas fails to load 'not_default' the last sheet which is empty (to be expected?)

C:\Anaconda\lib\site-packages\pandas\io\excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, *_kwds)
168 date_parser=date_parser, na_values=na_values, thousands=thousands,
169 convert_float=convert_float, has_index_names=has_index_names,
--> 170 skip_footer=skip_footer, converters=converters, *_kwds)
171
172 class ExcelFile(object):

C:\Anaconda\lib\site-packages\pandas\io\excel.pyc in _parse_excel(self, sheetname, header, skiprows, skip_footer, index_col, has_index_names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, verbose, **kwds)
370
371 if isinstance(asheetname, compat.string_types):
--> 372 sheet = self.book.sheet_by_name(asheetname)
373 else: # assume an integer if not a string
374 sheet = self.book.sheet_by_index(asheetname)

C:\Anaconda\lib\site-packages\xlrd\book.pyc in sheet_by_name(self, sheet_name)
439 sheetx = self._sheet_names.index(sheet_name)
440 except ValueError:
--> 441 raise XLRDError('No sheet named <%r>' % sheet_name)
442 return self.sheet_by_index(sheetx)
443

XLRDError: No sheet named <'not_default'>

@jreback
Copy link
Contributor

jreback commented Nov 27, 2015

pls show pd.show_versions() and use an example that is created / known to exist with 1 or more sheets (e.g. create it programatically).

sheetname=None returns ALL the sheets in a dict (but IIRC this is after 0.16.0), which should be indicated in the doc-string but is not.

@jreback jreback added the IO Excel read_excel, to_excel label Nov 27, 2015
@BreitA
Copy link
Author

BreitA commented Nov 27, 2015

Sorry it's my first time reporting issues on GitHub here is a full example with :

1)- a case where I generate a file with 2 sheets with data and an empty one at the end
2)- a case where I generate a file with 2 sheets with data and NO empty one at the end.
in case 2) the read_excel function works well, but in case 1) it returns an empty dict as I mentionned earlier.
Here is the full code and the print of the output (with pd.show_versions)

import pandas as pd
import numpy as np
from pandas import ExcelWriter

​# generate a file with an empty sheet at the end
path='D:\\test.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_end')
writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_end')
print 'df/empty_sheet_at_end:', df # pandas loads the empty sheets that is empty

# generate a file WITHOUT an empty sheet at the end
path='D:\\test2.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
writer.save()
df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df_dict.keys: Index([], dtype='object')

@jreback
Copy link
Contributor

jreback commented Nov 27, 2015

df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_end: Empty DataFrame
Columns: []
Index: []
df_dict.keys: [u'sheet1', u'sheet2']
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Nov 27, 2015

can you check if saving an empty frame in-the-middle (e.g. not the end) work?

@BreitA
Copy link
Author

BreitA commented Nov 27, 2015

import pandas as pd
import numpy as np
from pandas import ExcelWriter

print 'empty sheet end'
# generate a file with an empty sheet at the end
path='D:\\test.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_end')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_end')
print 'df/empty_sheet_at_end:', df # pandas loads the empty sheets that is empty

# generate a file with an empty sheet in the middle
print 'empty sheet middle'
path='D:\\testmiddle.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_middle')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_middle')
print 'df/empty_sheet_at_middle:', df # pandas loads the empty sheets that is empty


# generate a file with an empty sheet at the start
print 'empty sheet start'
path='D:\\teststart.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame()
df.to_excel(writer,'empty_sheet_at_start')
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
df=pd.read_excel(path,sheetname='empty_sheet_at_start')
print 'df/empty_sheet_at_start:', df # pandas loads the empty sheets that is empty

print 'without empty sheet'
# generate a file WITHOUT an empty sheet at the end
path='D:\\test2.xlsx'
writer = pd.ExcelWriter(path)
df=pd.DataFrame(columns=['A','B'],data=np.ones([1,2]))
df.to_excel(writer,'sheet1')
df=pd.DataFrame(columns=['C','D'],data=2*np.ones([1,2]))
df.to_excel(writer,'sheet2')

writer.save()

df_dict=pd.read_excel(path,sheetname=None)
print 'df_dict.keys:', df_dict.keys() # pandas fails to load the file
df=pd.read_excel(path,sheetname='sheet1')
print 'df/sheet1:', df# pandas loads first sheet normally
df=pd.read_excel(path,sheetname='sheet2')
print 'df/sheet2:',df # pandas loads second sheet normally
print pd.show_versions()
# This is the Output 

empty sheet end
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_end: Empty DataFrame
Columns: []
Index: []
empty sheet middle
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_middle: Empty DataFrame
Columns: []
Index: []
empty sheet start
df_dict.keys: Index([], dtype='object')
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
df/empty_sheet_at_start: Empty DataFrame
Columns: []
Index: []
without empty sheet
df_dict.keys: [u'sheet1', u'sheet2']
df/sheet1:    A  B
0  1  1
df/sheet2:    C  D
0  2  2
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
None
# Comment 

Ok so it appears the problem touch any excel file with an empty sheet 

@jreback
Copy link
Contributor

jreback commented Nov 29, 2015

@BreitA pls use triple-back ticks to format the output as this is very difficult to read.

I guess this is a bug, pull-requests to put in a proper test and fix are welcomed.

@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions labels Nov 29, 2015
@jreback jreback added this to the Next Major Release milestone Nov 29, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 11, 2015
jreback added a commit that referenced this issue Dec 12, 2015
BUG: read_excel fails when empty sheets exist and sheetname=None #11711
@jreback
Copy link
Contributor

jreback commented Jan 30, 2016

closed by #11819

@jreback jreback closed this as completed Jan 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

2 participants