Hi Anatol,
Anatol Belski wrote:
> Hi Christoph,
>
>> -----Original Message-----
>> From: Christoph Becker [mailto:[email protected]]
>> Sent: Saturday, July 25, 2015 12:09 AM
>> To: Anatol Belski <[email protected]>; 'Pierre Joye'
>> <[email protected]>
>> Cc: 'PHP internals' <[email protected]>
>> Subject: Re: [PHP-DEV] PCRE JIT stack size limit
>>
>> Now please consider the following simple expression:
>>
>> preg_match('/^(foo)+$/', str_repeat('foo', $n))
>>
>> This will fail (i.e. yield FALSE) independently of pcre.jit for large enough $n.
>> However, a user can change pcre.recursion_limit what will affect the $n limit
>> (the expression will fail for smaller or larger $n), if pcre.jit=0. If pcre.jit=1 the
>> user can't influence this boundary in any way, currently.
>>
>> And maybe even worse, with pcre.jit=0 the boundary is 50,000, but with
>> pcre.jit=1 it is only 1,366. Of course, one can argue that this is a contrived
>> example, and that such usage is crazy, but why do we have a default
>> pcre.recursion_limit of 100,000 then? A recursion_limit of
>> 2,734 would be sufficient to have a boundary of $n == 1,366.
>
> The 100000 is an empirical value by my guess. It regards to the default stack sizes on
> different platforms and to an average pattern. There is no prediction that PCRE will not exhaust the
> stack available to the binary.
ACK. However, the user is able to tune this value according to the
environment. And yes, I'm aware that pcre.recursion_limit may be
necessary to prevent a stack overflow, what can't happen with JIT
compilation (if the JIT stack is exhausted, pcre_exec() returns
gracefully), but nonetheless giving users some control over the size of
the JIT stack seems to be appropriate (at least in the long run, aka.
PHP 7.1).
>> All in all, as this example already suggests, classic execution of matching is done
>> by recursive calls (using normal stack frames), while JIT execution of matching is
>> iterative, using a special JIT stack.[1] I don't think it is justified to give users
>> a
>> setting to adjust for the former, but not for the latter (except to disable JIT,
>> albeit JIT might bring quite some boost especially for such cases).
>
> JIT without custom stack will use "32k of machine stack", by the doc. We currently
> don't really give users a choice to choose between iterative and recursive PCRE execution,
> it's a compile time decision.
Well, one can enforce non JIT execution by setting pcre.jit=0.
pcre.jit=1 indeed may not enforce JIT execution, depending on the
platform[1] and the libpcre version.
> And this is again because an iterative execution will be safer, but will affect an average case
> with unnecessary thrift. In this JIT case, one could give this choice, you're right, but we
> should evaluate it carefully. Fe what happens to the PHP pattern cache if JIT stack is exhausted?
That is unrelated. It is possible to use a single custom PCRE JIT stack
for all patterns (actually, pcre_exec() calls). Only if multi-threading
comes into play, each thread should have its own stack (that's not
absolutely necessary, but it simplifies things a lot).[2] Therefore
having a single PCRE_G(jit_stack) would be sufficient. I've committed a
naive draft to my php-src fork[3].
> What I was more precisely talking about is like
>
> If (false === preg_match(",evil pattern,", ...)) {
> Ini_set("pcre.jit", 0);
> // retry
> }
>
> So received error - no JIT. And no additional logic/overhead in ext/pcre.
Yes, that is a viable way for userland (even if I would suggest to test
against pcre_last_error(); preg_match() may return FALSE for several
error conditions, such as PHP_PCRE_BAD_UTF8_ERROR, and then retrying
with pcre.jit=0 wouldn't help). We could as well put the same logic
into ext/pcre (but I do not suggest to do so, even though I mentioned
this as option in the OP).
> Maybe custom JIT were eligible in this case, but according to the PCRE doc it can easily bring
> issues as the global JIT memory can't be resized/migrated just by one's finger click. Say
> one would be forced to either use the custom JIT stack or default JIT stack from the start on. This
> can end up with the over complication using a custom JIT stack for a particular pattern.
I would use the same JIT stack for all patterns (whether that is the
default JIT stack, or a custom one). The libpcre manual states[2]:
| You may safely use the same JIT stack for more than one pattern
| (either by assigning directly or by callback), as long as the
| patterns are all matched sequentially in the same thread.
To my knowledge in ext/pcre all patterns are matched sequentially, but
I'll double-check, and I'll inquire on the libpcre mailing list.
>> As we're pretty late in the game for PHP 7.0, it might be best to postpone a new
>> ini setting or other changes to PHP 7.1, but at the very least I would introduce a
>> new error constant, say PHP_PCRE_JIT_STACKLIMIT_ERROR[2], so users get a
>> more meaningful result when calling preg_last_error() than
>> PHP_PCRE_INTERNAL_ERROR. And it seems to be appropriate to add a note to
>> UPGRADING that pcre.jit=1 may cause some preg_*() to fail which would work
>> with pcre.jit=0.
>
> Yes, instead of returning false one could return an explicit error, or indicate in any other
> ways.
I would not suggest to change the return value of preg_match() and
friends. FALSE seems to be appropriate here. Instead I suggest to add
a new error constant which would be returned from preg_last_error().
> Thanks for bringing it up and maybe we'll have more info after your further investigation.
> But documentation is appropriate in any cases.
I'll make a respective PR after having checked back with the libpcre devs.
[1] <http://www.pcre.org/original/doc/html/pcrejit.html#SEC3>
[2] <http://www.pcre.org/original/doc/html/pcrejit.html#SEC8>
[3]
<https://github.com/cmb69/php-src/commit/78ad15d589c25b5d10aa68f5b419333f4040c16a>
--
Christoph M. Becker