Skip to content

EHN encoding parameter for to_latex #11914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

EHN encoding parameter for to_latex #11914

wants to merge 1 commit into from

Conversation

nbonnotte
Copy link
Contributor

closes #7061

I'm working on solving issue #7061: currently, with Python 2 it is not possible to use .to_latex() when the dataframe contains utf-8 strings. This PR adds an encoding parameter to make it possible.

The work is not completely done though, and I'm making this PR to start a discussion here. In its current state, the PR quickly fixes the issue and provides tests, so it is kind of a minimal PR.

@jreback in the discussion on issue #7061, you made the following suggestion:

you may want to add a LatexFormatter (as a sub-class of DataFrameFormatter) as this will allow some re-factoring to be internally done later on.

I would be glad to work in that direction, but I'm a bit confused by the current organization of the code. This LatexFormatter would seem to me to be similar to HTMLFormatter, which inherits from TableFormatter (exactly as DataFrameFormatter). In that case, DataFrameFormatter would use LatexFormatter for .to_latex() exactly as it uses HTMLFormatter for .to_html(). Is that what you had in mind?

df = DataFrame([[u'au\xdfgangen']])
with tm.ensure_clean('test.tex') as path:
df.to_latex(path, encoding='utf-8')
import codecs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the import at the top of the module

@jreback
Copy link
Contributor

jreback commented Dec 28, 2015

@nbonnotte what I mean is that I think a LatexFormatter be created in format.py which is like HTMLFormatter (its a sub-class of TableFormatter). So you don't have all of the latex formatting code living in to_latex itself (which instead just instantiates a class and calls .write_result, similarly to to_html).

This will allow easier refactoring later.

@jreback jreback added IO LaTeX to_latex Unicode Unicode strings labels Dec 28, 2015
@nbonnotte
Copy link
Contributor Author

BTW, this does not solves the missing decimal parameter I mentioned in the discussion of issue #7061

@jreback
Copy link
Contributor

jreback commented Dec 29, 2015

@nbonnotte what was the decimal issue? (can't seem to find it in that issue)

@nbonnotte
Copy link
Contributor Author

I just meant the decimal parameter is still lacking, I didn't add it. I should do another PR to do that.

@@ -302,6 +302,9 @@ Other API Changes

- ``.memory_usage`` now includes values in the index, as does memory_usage in ``.info`` (:issue:`11597`)

- ``DataFrame.to_latex()`` now supports non-ascii encodings (eg utf-8) in Python 2 with the parameter ``encoding``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number (7061)

@jreback jreback added this to the 0.18.0 milestone Jan 11, 2016
@jreback
Copy link
Contributor

jreback commented Jan 11, 2016

@nbonnotte create another issue for the decimal parameter (we are going to close #7061) with this PR

@jreback
Copy link
Contributor

jreback commented Jan 11, 2016

comments, ping when green.

@nbonnotte
Copy link
Contributor Author

I've created another issue for decimal (#12031)

Working on the docstring, I realized there was something odd: I was initializing LatexFormatter with a DataFrameFormatter and a variable strcols containing the result of DataFrameFormatter._to_str_columns.

I think it is simpler if the parameter strcols is removed and the call to DataFrameFormatter._to_str_columns is performed in LatexFormatter. You can see in the 2nd commit the changes (dc425cb0e959ccab2941cab27f329ec41cb768d7).

There is still one thing that bothers me: now I'm calling a "private" method (_to_str_columns) from outside (that is, from LatexFormatter). Is that a problem?

@jreback
Copy link
Contributor

jreback commented Jan 13, 2016

@nbonnotte no that's ok

@nbonnotte
Copy link
Contributor Author

I've squashed my commits then.

All green!

def __init__(self, formatter, column_format=None, longtable=False):
self.fmt = formatter
self.frame = self.fmt.frame
self.columns = self.fmt.tr_frame.columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using these variables? (e.g. .columns); I'd rather you not define them, or use a cached-property if its more clear (e.g. for .frame)

@jreback
Copy link
Contributor

jreback commented Jan 14, 2016

pls git diff master | flake8 --diff

@nbonnotte
Copy link
Contributor Author

@jreback should be good now ^^

@jreback
Copy link
Contributor

jreback commented Jan 15, 2016

merged via 3a832df

thanks!

@jreback jreback closed this Jan 15, 2016
@nbonnotte nbonnotte deleted the to_latex-encoding-7061 branch January 15, 2016 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO LaTeX to_latex Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode handling in to_latex. Needs encoding?
2 participants