@@ -300,7 +300,7 @@ Expression Evaluation via :func:`~pandas.eval` (Experimental)
300300
301301.. versionadded :: 0.13
302302
303- The top-level function :func: `~ pandas.eval ` implements expression evaluation of
303+ The top-level function :func: `pandas.eval ` implements expression evaluation of
304304:class: `~pandas.Series ` and :class: `~pandas.DataFrame ` objects.
305305
306306.. note ::
@@ -336,11 +336,11 @@ engine in addition to some extensions available only in pandas.
336336Supported Syntax
337337~~~~~~~~~~~~~~~~
338338
339- These operations are supported by :func: `~ pandas.eval `:
339+ These operations are supported by :func: `pandas.eval `:
340340
341341- Arithmetic operations except for the left shift (``<< ``) and right shift
342342 (``>> ``) operators, e.g., ``df + 2 * pi / s ** 4 % 42 - the_golden_ratio ``
343- - Comparison operations, e.g., ``2 < df < df2 ``
343+ - Comparison operations, including chained comparisons, e.g., ``2 < df < df2 ``
344344- Boolean operations, e.g., ``df < df2 and df3 < df4 or not df_bool ``
345345- ``list `` and ``tuple `` literals, e.g., ``[1, 2] `` or ``(1, 2) ``
346346- Attribute access, e.g., ``df.a ``
@@ -373,9 +373,9 @@ This Python syntax is **not** allowed:
373373:func: `~pandas.eval ` Examples
374374~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
375375
376- :func: `~ pandas.eval ` works wonders for expressions containing large arrays
376+ :func: `pandas.eval ` works well with expressions containing large arrays
377377
378- First let's create 4 decent-sized arrays to play with:
378+ First let's create a few decent-sized arrays to play with:
379379
380380.. ipython :: python
381381
@@ -441,8 +441,10 @@ Now let's do the same thing but with comparisons:
441441The ``DataFrame.eval `` method (Experimental)
442442~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
443443
444- In addition to the top level :func: `~pandas.eval ` function you can also
445- evaluate an expression in the "context" of a ``DataFrame ``.
444+ .. versionadded :: 0.13
445+
446+ In addition to the top level :func: `pandas.eval ` function you can also
447+ evaluate an expression in the "context" of a :class: `~pandas.DataFrame `.
446448
447449.. ipython :: python
448450 :suppress:
@@ -462,10 +464,10 @@ evaluate an expression in the "context" of a ``DataFrame``.
462464 df = DataFrame(randn(5 , 2 ), columns = [' a' , ' b' ])
463465 df.eval(' a + b' )
464466
465- Any expression that is a valid :func: `~ pandas.eval ` expression is also a valid
466- `` DataFrame.eval `` expression, with the added benefit that * you don't have to
467- prefix the name of the * `` DataFrame `` * to the column(s) you're interested in
468- evaluating * .
467+ Any expression that is a valid :func: `pandas.eval ` expression is also a valid
468+ :meth: ` DataFrame.eval ` expression, with the added benefit that you don't have to
469+ prefix the name of the :class: ` ~pandas. DataFrame ` to the column(s) you're
470+ interested in evaluating.
469471
470472In addition, you can perform assignment of columns within an expression.
471473This allows for *formulaic evaluation *. Only a single assignment is permitted.
@@ -480,55 +482,75 @@ it must be a valid Python identifier.
480482 df.eval(' a = 1' )
481483 df
482484
485+ The equivalent in standard Python would be
486+
487+ .. ipython :: python
488+
489+ df = DataFrame(dict (a = range (5 ), b = range (5 , 10 )))
490+ df[' c' ] = df.a + df.b
491+ df[' d' ] = df.a + df.b + df.c
492+ df[' a' ] = 1
493+ df
494+
483495 Local Variables
484496~~~~~~~~~~~~~~~
485497
486- You can refer to local variables the same way you would in vanilla Python
498+ In pandas version 0.14 the local variable API has changed. In pandas 0.13.x,
499+ you could refer to local variables the same way you would in standard Python.
500+ For example,
487501
488- .. ipython :: python
502+ .. code-block :: python
489503
490504 df = DataFrame(randn(5 , 2 ), columns = [' a' , ' b' ])
491505 newcol = randn(len (df))
492506 df.eval(' b + newcol' )
493507
494- .. note ::
508+ UndefinedVariableError: name ' newcol ' is not defined
495509
496- The one exception is when you have a local (or global) with the same name as
497- a column in the ``DataFrame ``
510+ As you can see from the exception generated, this syntax is no longer allowed.
511+ You must *explicitly reference * any local variable that you want to use in an
512+ expression by placing the ``@ `` character in front of the name. For example,
498513
499- .. code-block :: python
514+ .. ipython :: python
500515
501- df = DataFrame(randn(5 , 2 ), columns = [ ' a ' , ' b ' ] )
502- a = randn(len (df))
503- df.eval(' a + b ' )
504- NameResolutionError: resolvers and locals overlap on names [ ' a ' ]
516+ df = DataFrame(randn(5 , 2 ), columns = list ( ' ab ' ) )
517+ newcol = randn(len (df))
518+ df.eval(' b + @newcol ' )
519+ df.query( ' b < @newcol ' )
505520
521+ If you don't prefix the local variable with ``@ ``, pandas will raise an
522+ exception telling you the variable is undefined.
506523
507- To deal with these conflicts, a special syntax exists for referring
508- variables with the same name as a column
524+ When using :meth: `DataFrame.eval ` and :meth: `DataFrame.query `, this allows you
525+ to have a local variable and a :class: `~pandas.DataFrame ` column with the same
526+ name in an expression.
509527
510- .. ipython :: python
511- :suppress:
512528
513- a = randn( len (df))
529+ .. ipython :: python
514530
515- .. ipython :: python
531+ a = randn()
532+ df.query(' @a < a' )
533+ df.loc[a < df.a] # same as the previous expression
516534
517- df.eval(' @a + b' )
535+ With :func: `pandas.eval ` you cannot use the ``@ `` prefix *at all *, because it
536+ isn't defined in that context. ``pandas `` will let you know this if you try to
537+ use ``@ `` in a top-level call to :func: `pandas.eval `. For example,
518538
519- The same is true for :meth: `~pandas.DataFrame.query `
539+ .. ipython :: python
540+ :okexcept:
520541
521- .. ipython :: python
542+ a, b = 1 , 2
543+ pd.eval(' @a + b' )
522544
523- df.query(' @a < b' )
545+ In this case, you should simply refer to the variables like you would in
546+ standard Python.
524547
525- .. ipython :: python
526- :suppress:
548+ .. ipython :: python
527549
528- del a
550+ pd.eval( ' a + b ' )
529551
530552
531- :func: `~ pandas.eval ` Parsers
553+ :func: `pandas.eval ` Parsers
532554~~~~~~~~~~~~~~~~~~~~~~~~~~~~
533555
534556There are two different parsers and and two different engines you can use as
@@ -568,7 +590,7 @@ The ``and`` and ``or`` operators here have the same precedence that they would
568590in vanilla Python.
569591
570592
571- :func: `~ pandas.eval ` Backends
593+ :func: `pandas.eval ` Backends
572594~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
573595
574596There's also the option to make :func: `~pandas.eval ` operate identical to plain
@@ -577,12 +599,12 @@ ol' Python.
577599.. note ::
578600
579601 Using the ``'python' `` engine is generally *not * useful, except for testing
580- other :func: `~pandas.eval ` engines against it. You will acheive **no **
581- performance benefits using :func: `~pandas.eval ` with ``engine='python' ``.
602+ other evaluation engines against it. You will acheive **no ** performance
603+ benefits using :func: `~pandas.eval ` with ``engine='python' `` and in fact may
604+ incur a performance hit.
582605
583- You can see this by using :func: `~pandas.eval ` with the ``'python' `` engine is
584- actually a bit slower (not by much) than evaluating the same expression in
585- Python:
606+ You can see this by using :func: `pandas.eval ` with the ``'python' `` engine. It
607+ is a bit slower (not by much) than evaluating the same expression in Python
586608
587609.. ipython :: python
588610
@@ -593,15 +615,15 @@ Python:
593615 % timeit pd.eval(' df1 + df2 + df3 + df4' , engine = ' python' )
594616
595617
596- :func: `~ pandas.eval ` Performance
618+ :func: `pandas.eval ` Performance
597619~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
598620
599621:func: `~pandas.eval ` is intended to speed up certain kinds of operations. In
600622particular, those operations involving complex expressions with large
601- `` DataFrame ``/`` Series `` objects should see a significant performance benefit.
602- Here is a plot showing the running time of :func: ` ~pandas.eval ` as function of
603- the size of the frame involved in the computation. The two lines are two
604- different engines.
623+ :class: ` ~pandas. DataFrame `/ :class: ` ~pandas. Series ` objects should see a
624+ significant performance benefit. Here is a plot showing the running time of
625+ :func: ` pandas.eval ` as function of the size of the frame involved in the
626+ computation. The two lines are two different engines.
605627
606628
607629.. image :: _static/eval-perf.png
@@ -618,19 +640,31 @@ different engines.
618640This plot was created using a ``DataFrame `` with 3 columns each containing
619641floating point values generated using ``numpy.random.randn() ``.
620642
621- Technical Minutia
622- ~~~~~~~~~~~~~~~~~
623- - Expressions that would result in an object dtype (including simple
624- variable evaluation) have to be evaluated in Python space. The main reason
625- for this behavior is to maintain backwards compatbility with versions of
626- numpy < 1.7. In those versions of ``numpy `` a call to ``ndarray.astype(str) ``
627- will truncate any strings that are more than 60 characters in length. Second,
628- we can't pass ``object `` arrays to ``numexpr `` thus string comparisons must
629- be evaluated in Python space.
630- - The upshot is that this *only * applies to object-dtype'd expressions. So,
631- if you have an expression--for example--that's a string comparison
632- ``and ``-ed together with another boolean expression that's from a numeric
633- comparison, the numeric comparison will be evaluated by ``numexpr ``. In fact,
634- in general, :func: `~pandas.query `/:func: `~pandas.eval ` will "pick out" the
635- subexpressions that are ``eval ``-able by ``numexpr `` and those that must be
636- evaluated in Python space transparently to the user.
643+ Technical Minutia Regarding Expression Evaluation
644+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
645+
646+ Expressions that would result in an object dtype or involve datetime operations
647+ (because of ``NaT ``) must be evaluated in Python space. The main reason for
648+ this behavior is to maintain backwards compatbility with versions of numpy <
649+ 1.7. In those versions of ``numpy `` a call to ``ndarray.astype(str) `` will
650+ truncate any strings that are more than 60 characters in length. Second, we
651+ can't pass ``object `` arrays to ``numexpr `` thus string comparisons must be
652+ evaluated in Python space.
653+
654+ The upshot is that this *only * applies to object-dtype'd expressions. So, if
655+ you have an expression--for example
656+
657+ .. ipython :: python
658+
659+ df = DataFrame({' strings' : np.repeat(list (' cba' ), 3 ),
660+ ' nums' : np.repeat(range (3 ), 3 )})
661+ df
662+ df.query(' strings == "a" and nums == 1' )
663+
664+ the numeric part of the comparison (``nums == 1 ``) will be evaluated by
665+ ``numexpr ``.
666+
667+ In general, :meth: `DataFrame.query `/:func: `pandas.eval ` will
668+ evaluate the subexpressions that *can * be evaluated by ``numexpr `` and those
669+ that must be evaluated in Python space transparently to the user. This is done
670+ by inferring the result type of an expression from its arguments and operators.
0 commit comments