[#116534] [Ruby master Bug#20231] Don't wait in io_binwrite_string if not necessary. — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

Issue #20231 has been reported by ioquatix (Samuel Williams).

8 messages 2024/02/01

[#116565] [Ruby master Feature#20235] Deprecate CHAR syntax — "Dan0042 (Daniel DeLorme) via ruby-core" <ruby-core@...>

Issue #20235 has been reported by Dan0042 (Daniel DeLorme).

8 messages 2024/02/03

[#116581] [Ruby master Bug#20237] Unable to unshare(CLONE_NEWUSER) in Linux because of timer thread — "hanazuki (Kasumi Hanazuki) via ruby-core" <ruby-core@...>

Issue #20237 has been reported by hanazuki (Kasumi Hanazuki).

10 messages 2024/02/05

[#116589] [Ruby master Misc#20238] Use prism for mk_builtin_loader.rb — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>

Issue #20238 has been reported by kddnewton (Kevin Newton).

22 messages 2024/02/05

[#116640] [Ruby master Feature#20249] Print only backtraces in rb_bug(), by default — "osyoyu (Daisuke Aritomo) via ruby-core" <ruby-core@...>

Issue #20249 has been reported by osyoyu (Daisuke Aritomo).

11 messages 2024/02/09

[#116664] [Ruby master Misc#20254] FYI: Add Launchable into Ruby CI — "ono-max (Naoto Ono) via ruby-core" <ruby-core@...>

Issue #20254 has been reported by ono-max (Naoto Ono).

18 messages 2024/02/10

[#116666] [Ruby master Bug#20255] Embedded arrays aren't moved correctly across ractors — "luke-gru (Luke Gruber) via ruby-core" <ruby-core@...>

Issue #20255 has been reported by luke-gru (Luke Gruber).

18 messages 2024/02/10

[#116681] [Ruby master Misc#20260] ISEQ flag for prism compiler — "kddnewton (Kevin Newton) via ruby-core" <ruby-core@...>

Issue #20260 has been reported by kddnewton (Kevin Newton).

15 messages 2024/02/12

[#116696] [Ruby master Bug#20264] Segfault installing RMagick on M1 Mac — "andy@... (Andy Jeffries) via ruby-core" <ruby-core@...>

Issue #20264 has been reported by [email protected] (Andy Jeffries).

7 messages 2024/02/13

[#116760] [Ruby master Feature#20265] Deprecate and remove rb_newobj and rb_newobj_of — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

SXNzdWUgIzIwMjY1IGhhcyBiZWVuIHJlcG9ydGVkIGJ5IHBldGVyemh1MjExOCAoUGV0ZXIgWmh1

8 messages 2024/02/14

[#116769] [Ruby master Feature#20266] New syntax to escape embed strings in Regexp literal — "usa (Usaku NAKAMURA) via ruby-core" <ruby-core@...>

Issue #20266 has been reported by usa (Usaku NAKAMURA).

8 messages 2024/02/15

[#116819] [Ruby master Feature#20275] Avoid extra backtrace entries for rescue and ensure — "Eregon (Benoit Daloze) via ruby-core" <ruby-core@...>

Issue #20275 has been reported by Eregon (Benoit Daloze).

8 messages 2024/02/17

[#116827] [Ruby master Feature#20276] Introduce Fiber interfaces for Ractors — "forthoney (Seong-Heon Jung) via ruby-core" <ruby-core@...>

Issue #20276 has been reported by forthoney (Seong-Heon Jung).

8 messages 2024/02/17

[#116846] [Ruby master Misc#20281] DevMeeting-2024-03-14 — "mame (Yusuke Endoh) via ruby-core" <ruby-core@...>

Issue #20281 has been reported by mame (Yusuke Endoh).

16 messages 2024/02/19

[#116853] [Ruby master Feature#20282] Enhancing Ruby's Coverage with Per-Test Coverage Reports — "ioquatix (Samuel Williams) via ruby-core" <ruby-core@...>

Issue #20282 has been reported by ioquatix (Samuel Williams).

7 messages 2024/02/19

[#116902] [Ruby master Feature#20290] Add API for C extensions to free memory — "peterzhu2118 (Peter Zhu) via ruby-core" <ruby-core@...>

Issue #20290 has been reported by peterzhu2118 (Peter Zhu).

9 messages 2024/02/21

[#116940] [Ruby master Feature#20300] Hash: set value and get pre-existing value in one call — "AMomchilov (Alexander Momchilov) via ruby-core" <ruby-core@...>

Issue #20300 has been reported by AMomchilov (Alexander Momchilov).

19 messages 2024/02/26

[#116941] [Ruby master Bug#20301] `Set#add?` does two hash look-ups — "AMomchilov (Alexander Momchilov) via ruby-core" <ruby-core@...>

Issue #20301 has been reported by AMomchilov (Alexander Momchilov).

10 messages 2024/02/26

[#116965] [Ruby master Bug#20307] `Hash#update` from compare_by_identity hash can have unfrozen string keys — "nobu (Nobuyoshi Nakada) via ruby-core" <ruby-core@...>

Issue #20307 has been reported by nobu (Nobuyoshi Nakada).

7 messages 2024/02/27

[#116983] [Ruby master Feature#20309] Bundled gems for Ruby 3.5 — "hsbt (Hiroshi SHIBATA) via ruby-core" <ruby-core@...>

Issue #20309 has been reported by hsbt (Hiroshi SHIBATA).

28 messages 2024/02/27

[ruby-core:117013] [Ruby master Bug#20315] Quantifier expansion leads to different results in Regexp.

From: "jirkamarsik (Jirka Marsik) via ruby-core" <ruby-core@...>
Date: 2024-02-29 16:59:34 UTC
List: ruby-core #117013
Issue #20315 has been reported by jirkamarsik (Jirka Marsik).

----------------------------------------
Bug #20315: Quantifier expansion leads to different results in Regexp.
https://bugs.ruby-lang.org/issues/20315

* Author: jirkamarsik (Jirka Marsik)
* Status: Open
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Consider the following series of regular expression matches:
```
irb(main):001:0> /(|a)(|a)(|a)(|a)(|a)b/.match("aaaab")
=> #<MatchData "aaaab" 1:"" 2:"a" 3:"a" 4:"a" 5:"a">
irb(main):002:0> /(|a)(|a)(|a)(|a)b/.match("aaab")
=> #<MatchData "aaab" 1:"" 2:"a" 3:"a" 4:"a">
irb(main):003:0> /(|a)(|a)(|a)b/.match("aab")
=> #<MatchData "aab" 1:"" 2:"a" 3:"a">
irb(main):004:0> /(|a)(|a)b/.match("ab")
=> #<MatchData "ab" 1:"" 2:"a">
```

Let `X^{N}` mean `N` concatenated repetitions of `X`. When matching the pattern `/(|a)^{N}b/` against `a^{N-1}b`, the first group will match the empty string and the last `N-1` groups will match `a`.

Now, let's look at this series of similar expressions, in which `(|a)^{N}` is replaced with `(|a){N}`, i.e. a counted quantifier.

```
irb(main):001:0> /(|a){5}b/.match("aaaab")
=> #<MatchData "aaaab" 1:"">
irb(main):002:0> /(|a){4}b/.match("aaab")
=> #<MatchData "aaab" 1:"">
irb(main):003:0> /(|a){3}b/.match("aab")
=> #<MatchData "aab" 1:"">
irb(main):004:0> /(|a){2}b/.match("ab")
=> #<MatchData "ab" 1:"a">
```

When matching the pattern `/(|a){N}b/` against `a^{N-1}b`, the first `N-1` iterations will match `a` and the `N`-th iteration will match the empty string (compare this with the behavior of the first series of expressions). However, something strange happens when `N` is 2. We end up getting a result which is not consistent with this series, but looks like a result that belongs to the first series discussed above.

This is due to quantifier expansion done by the regexp compiler (see usages of `QUANTIFIER_EXPAND_LIMIT_SIZE` in `regcomp.c`). This is an optimization that tries to remove the overhead of managing a counted repetition during regex execution at the cost of increasing the size of the compiled regex bytecode. The source of the inconsistency is caused by the fact that this optimization can actually change the semantics of the regular expression, because as we have seen above, `X^{N}` (`X` repeated `N` times) can have different semantics from `X{N}` (a single `X` with an `{N}` quantifier).


Compilation and execution of `/(|a){3}b/.match("aab")` with `ONIG_DEBUG_PARSE_TREE`, `ONIG_DEBUG_COMPILE` and `ONIG_DEBUG_MATCH`:
```
PATTERN: /(|a){3}b/ (US-ASCII)
<list:556161bb11f0>
   <quantifier:556161bb1230>{3,3}
      <enclose:556161bb1330> memory:1
         <alt:556161bb11b0>
            <string:556161bb12f0>
            <string:556161bb12b0>a
   <string:556161bb13b0>b
optimize: EXACT
  anchor: []
  sub anchor: []

exact: [b]: length: 1
code length: 37
0:[repeat:0:27] 7:[null-check-start:0] 10:[mem-start:1] 13:[push:(+5)] 18:[jump:(+2)]
23:[exact1:a] 25:[mem-end:1] 28:[null-check-end-memst:0] 31:[repeat-inc:0] 34:[exact1:b]
36:[end]
match_at: str: 140607416406704 (0x7fe1b71b92b0), end: 140607416406706 (0x7fe1b71b92b2), start: 140607416406704 (0x7fe1b71b92b0), sprev: 0 ((nil))
size: 2, start offset: 0

 ofs> str                   stk:type   addr:opcode
   0> "ab"                    0:Alt       0:[repeat:0:27]
   0> "ab"                    1:Rep       7:[null-check-start:0]
   0> "ab"                    2:NulChS   10:[mem-start:1]
   0> "ab"                    2:NulChS   13:[push:(+5)]
   0> "ab"                    3:Alt      18:[jump:(+2)]
   0> "ab"                    3:Alt      25:[mem-end:1]
   0> "ab"                    3:Alt      28:[null-check-end-memst:0]
NULL_CHECK_END_MEMST: skip  id:0, s:140607416406704 (0x7fe1b71b92b0)
   0> "ab"                    3:Alt      34:[exact1:b]
   0> "ab"                    2:NulChS   23:[exact1:a]
   1> "b"                     2:NulChS   25:[mem-end:1]
   1> "b"                     2:NulChS   28:[null-check-end-memst:0]
   1> "b"                     2:NulChS   31:[repeat-inc:0]
   1> "b"                     3:RepInc    7:[null-check-start:0]
   1> "b"                     4:NulChS   10:[mem-start:1]
   1> "b"                     4:NulChS   13:[push:(+5)]
   1> "b"                     5:Alt      18:[jump:(+2)]
   1> "b"                     5:Alt      25:[mem-end:1]
   1> "b"                     5:Alt      28:[null-check-end-memst:0]
NULL_CHECK_END_MEMST: skip  id:0, s:140607416406705 (0x7fe1b71b92b1)
   1> "b"                     5:Alt      34:[exact1:b]
   2> ""                      5:Alt      36:[end]
```


Compilation and execution of `/(|a){2}b/.match("ab")` with `ONIG_DEBUG_PARSE_TREE`, `ONIG_DEBUG_COMPILE` and `ONIG_DEBUG_MATCH`:
```
PATTERN: /(|a){2}b/ (US-ASCII)
<list:55bbc826c1f0>
   <quantifier:55bbc826c230>{2,2}
      <enclose:55bbc826c330> memory:1
         <alt:55bbc826c1b0>
            <string:55bbc826c2f0>
            <string:55bbc826c2b0>a
   <string:55bbc826c3b0>b
optimize: EXACT
  anchor: []
  sub anchor: []

exact: [b]: length: 1
code length: 39
0:[mem-start:1] 3:[push:(+5)] 8:[jump:(+2)] 13:[exact1:a] 15:[mem-end:1]
18:[mem-start:1] 21:[push:(+5)] 26:[jump:(+2)] 31:[exact1:a] 33:[mem-end:1]
36:[exact1:b] 38:[end]
match_at: str: 140139875963504 (0x7f74db869270), end: 140139875963506 (0x7f74db869272), start: 140139875963504 (0x7f74db869270), sprev: 0 ((nil))
size: 2, start offset: 0

 ofs> str                   stk:type   addr:opcode
   0> "ab"                    0:Alt       0:[mem-start:1]
   0> "ab"                    0:Alt       3:[push:(+5)]
   0> "ab"                    1:Alt       8:[jump:(+2)]
   0> "ab"                    1:Alt      15:[mem-end:1]
   0> "ab"                    1:Alt      18:[mem-start:1]
   0> "ab"                    1:Alt      21:[push:(+5)]
   0> "ab"                    2:Alt      26:[jump:(+2)]
   0> "ab"                    2:Alt      33:[mem-end:1]
   0> "ab"                    2:Alt      36:[exact1:b]
   0> "ab"                    1:Alt      31:[exact1:a]
   1> "b"                     1:Alt      33:[mem-end:1]
   1> "b"                     1:Alt      36:[exact1:b]
   2> ""                      1:Alt      38:[end]
```



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- [email protected]
 To unsubscribe send an email to [email protected]
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

In This Thread

Prev Next