From: lsegal@...
Date: 2020-03-06T09:42:31+00:00
Subject: [ruby-core:97380] [Ruby master Bug#16675] Regression on Ripper in Ruby 2.7 when parsing new line

Issue #16675 has been updated by lsegal (Loren Segal).

Status changed from Closed to Open

I'm not sure why this was closed so quickly. There is a real bug here affecting many users who indirectly rely on this core library.

It may be the case that comments can be placed between dots, but that should not affect parsing order. As it stands there are multiple important points:

1) This is either a regression from 2.6.3 or a breaking change in 2.7.0, but in either case, that is a pretty big deal. Users of the ripper library, part of core, get one behavior on one version and another on the next. Given that Ruby 2.x was supposed to be free of breaking changes, I think this deserves more attention.

2) If this is "expected behavior", it is a fundamentally broken API in which an end user cannot actually determine the correct order of tokens while parsing. In other words, this change makes Ripper completely unreliable for parsing comments, which is really odd. Looking purely at the output from the example's PARSING section, it seems clear that the output generated using the API is not representative of the source provided. That to me is a clear reproduction of a bug. Ripper should be able to provide an API that represents source correctly, and this is not happening.

3) Most importantly, I'm not entirely clear on the justification for the change. Scanner events in Ripper represent tokens, not semantic AST nodes, and should therefore represent the token stream as-is. Even *if* comments can separate an expression, the _tokens_ should not require extra context to tokenize-- this is the whole premise of grammar tokenization. Specifically, you can still have an AST generated in which `on_comment` is called between AST nodes, since `on_comment` is not actually used to determine where the newline lives.

4) Finally, comments have always been allowed to separate expressions, but this only broke as of 2.7.0. Ripper even all the way back in Ruby 2.3.3 was able to handle the "fluent dot" scenario just fine, so I'm not sure why this is an issue all of a sudden:

```
irb(main):006:0> Ripper.sexp("foo. # xxx\nbar")
=> [:program, [[:call, [:vcall, [:@ident, "foo", [1, 0]]], :".", [:@ident, "bar", [2, 0]]]]]
irb(main):007:0> Ripper.lex("foo. # xxx\nbar")
=> [[[1, 0], :on_ident, "foo"], [[1, 3], :on_period, "."], [[1, 4], :on_sp, " "], [[1, 5], :on_comment, "# xxx\n"], [[2, 0], :on_ident, "bar"]]
irb(main):008:0> RUBY_VERSION
=> "2.3.3"
```

I think this should be revisited as a regression given that the example above shows a clear case of something not working as intended.

If there is no intention to fix this, I'm curious what the correct way is to use the Ripper API to determine comment order alongside AST nodes is that works without behavioral change across all Ripper releases?

----------------------------------------
Bug #16675: Regression on Ripper in Ruby 2.7 when parsing new line 
https://bugs.ruby-lang.org/issues/16675#change-84506

* Author: Benoit_Tigeot (Benoit Tigeot)
* Status: Open
* Priority: Normal
* ruby -v: 2.7
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Hello

While using migrating RSpec documentation to last Yard. I noticed an issue in code parsing and Ripper. The regression appears on Ruby 2.7 and Head.

``` ruby
require 'pp'
require 'ripper'

SOURCE = "def name\n  # comment\nend"

class RipperParser < Ripper
  attr_accessor :tokens

  SCANNER_EVENTS.each do |event|
    define_method("on_#{event}") do |*args|
      puts "TOKEN:     #{event}"
      (@tokens ||= []) << [event, args]
      super(*args)
    end
  end
end

parser = RipperParser.new(SOURCE, '(stdin)')

puts "PARSING:"
parser.parse

puts "\nTOKENS:"
pp parser.tokens

puts "\nRIPPER SAYS"
pp Ripper.lex(SOURCE)
```


```diff
--- a/2-6-3_ripper_lex.txt
+++ b/2-7-0_ripper_lex.txt
@@ -1,27 +1,27 @@
->> RUBY_VERSION: 2.6.3
+>> RUBY_VERSION: 2.7.0
 PARSING:
 TOKEN:     kw
 TOKEN:     sp
 TOKEN:     ident
-TOKEN:     nl
 TOKEN:     sp
 TOKEN:     comment
+TOKEN:     nl
 TOKEN:     kw

 TOKENS:
 [[:kw, ["def"]],
  [:sp, [" "]],
  [:ident, ["name"]],
- [:nl, ["\n"]],
  [:sp, ["  "]],
  [:comment, ["# comment\n"]],
+ [:nl, ["\n"]],
  [:kw, ["end"]]]

 RIPPER SAYS
-[[[1, 0], :on_kw, "def", EXPR_FNAME],
- [[1, 3], :on_sp, " ", EXPR_FNAME],
- [[1, 4], :on_ident, "name", EXPR_ENDFN],
- [[1, 8], :on_nl, "\n", EXPR_BEG],
- [[2, 0], :on_sp, "  ", EXPR_BEG],
- [[2, 2], :on_comment, "# comment\n", EXPR_BEG],
- [[3, 0], :on_kw, "end", EXPR_END]]
+[[[1, 0], :on_kw, "def", FNAME],
+ [[1, 3], :on_sp, " ", FNAME],
+ [[1, 4], :on_ident, "name", ENDFN],
+ [[1, 8], :on_nl, "\n", BEG],
+ [[2, 0], :on_sp, "  ", ENDFN],
+ [[2, 2], :on_comment, "# comment\n", ENDFN],
+ [[3, 0], :on_kw, "end", END]]
```

As Loren Segal [mentionned](https://github.com/lsegal/yard/issues/1313#issuecomment-595458928)
> Note that "comment" is detected before "nl" in both the event and the collected tokens, which is different from the results in Ripper.lex


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>