Documentation
¶
Overview ¶
Package regonaut is an implementation of ECMAScript Regular Expressions.
Example ¶
re := MustCompile(".+(?<foo>bAr)", FlagIgnoreCase)
m := re.FindMatch([]byte("_Bar_"))
fmt.Printf("Groups[0] - %q\n", m.Groups[0].Data())
fmt.Printf("Groups[1] - %q\n", m.Groups[1].Data())
fmt.Printf("NamedGroups[\"foo\"] - %q\n", m.NamedGroups["foo"].Data())
Output: Groups[0] - "_Bar" Groups[1] - "Bar" NamedGroups["foo"] - "Bar"
Example (Utf8_vs_Utf16) ¶
The U+1F431 CAT FACE (🐱). In UTF-16 without 'u', it appears as two separate surrogate code units (0xD83D, 0xDC31). With 'u', those are paired into one code point.
var pattern = "c(.)(.)"
var patternUtf16 = []uint16{'c', '(', '.', ')', '(', '.', ')'}
var source = []byte("c🐱at")
var sourceUtf16 = []uint16{'c', 0xD83D, 0xDC31, 'a', 't'}
reUtf8 := MustCompile(pattern, 0)
m1 := reUtf8.FindMatch(source)
fmt.Printf("UTF-8: %q, %q\n", m1.Groups[1].Data(), m1.Groups[2].Data())
reUtf8Unicode := MustCompile(pattern, FlagUnicode)
m2 := reUtf8Unicode.FindMatch(source)
fmt.Printf("UTF-8 (with 'u' flag): %q, %q\n", m2.Groups[1].Data(), m2.Groups[2].Data())
reUtf16 := MustCompileUtf16(patternUtf16, 0)
m3 := reUtf16.FindMatch(sourceUtf16)
fmt.Printf("UTF-16: %#v, %#v\n", m3.Groups[1].Data(), m3.Groups[2].Data())
reUtf16Unicode := MustCompileUtf16(patternUtf16, FlagUnicode)
m4 := reUtf16Unicode.FindMatch(sourceUtf16)
fmt.Printf("UTF-16 (with 'u' flag): %#v, %#v\n", m4.Groups[1].Data(), m4.Groups[2].Data())
Output: UTF-8: "🐱", "a" UTF-8 (with 'u' flag): "🐱", "a" UTF-16: []uint16{0xd83d}, []uint16{0xdc31} UTF-16 (with 'u' flag): []uint16{0xd83d, 0xdc31}, []uint16{0x61}
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Flag ¶
type Flag uint16
Flag is a bitmask of RegExp options. The zero value corresponds to /pattern/ with no flags. Combine flags with bitwise OR, e.g. FlagIgnoreCase|FlagMultiline.
const ( // Case-insensitive matching ("i" flag). FlagIgnoreCase Flag = 1 << iota // "^" and "$" match line boundaries ("m" flag). FlagMultiline // "." matches line terminators ("s" flag). FlagDotAll // Unicode-aware mode ("u" flag). // If this flag is set, FlagAnnexB is ignored. FlagUnicode // Unicode set notation and string properties ("v" flag). // If this flag is set, FlagAnnexB is ignored. FlagUnicodeSets // Sticky match from current position ("y" flag). FlagSticky // Enables Annex B web-compat features. // When FlagUnicode or FlagUnicodeSets is set, // this flag is cleared automatically by the compiler. FlagAnnexB )
type Group ¶
type Group struct {
// Start is the inclusive start index of the captured substring,
// or -1 if the group did not participate in the match.
Start int
// End is the exclusive end index of the captured substring,
// or -1 if the group did not participate in the match.
End int
// Name is the group name if defined, otherwise empty.
Name string
// contains filtered or unexported fields
}
Group represents a single captured substring from a regular expression match against UTF-8 encoded input. It is safe for concurrent use by multiple goroutines.
type GroupUtf16 ¶
type GroupUtf16 struct {
// Start is the inclusive start index of the captured substring,
// or -1 if the group did not participate in the match.
Start int
// End is the exclusive end index of the captured substring,
// or -1 if the group did not participate in the match.
End int
// Name is the group name if defined, otherwise empty.
Name string
// contains filtered or unexported fields
}
GroupUtf16 represents a single captured substring from a regular expression match against UTF-16 encoded input. It is safe for concurrent use by multiple goroutines.
func (GroupUtf16) Data ¶
func (g GroupUtf16) Data() []uint16
Data returns the captured substring as a UTF-16 code units slice. If the group did not participate in the match (Start == -1), it returns nil.
type Match ¶
type Match struct {
// Groups is the ordered list of captures.
// Groups[0] is the full match; subsequent entries correspond to
// the capturing groups in the pattern.
Groups []Group
// NamedGroups maps a group name to its captured group.
NamedGroups map[string]Group
}
Match holds the result of a successful match against UTF-8 input. It is safe for concurrent use by multiple goroutines.
type MatchUtf16 ¶
type MatchUtf16 struct {
// Groups is the ordered list of captures.
// Groups[0] is the full match; subsequent entries correspond to
// the capturing groups in the pattern.
Groups []GroupUtf16
// NamedGroups maps a group name to its captured group.
NamedGroups map[string]GroupUtf16
}
MatchUtf16 holds the result of a successful match against UTF-16 input. It is safe for concurrent use by multiple goroutines.
type RegExp ¶
type RegExp struct {
// contains filtered or unexported fields
}
RegExp represents a compiled regular expression. It is safe for concurrent use by multiple goroutines. All methods on RegExp do not mutate internal state.
func Compile ¶
Compile parses a regular expression pattern and returns a RegExp that can be applied against UTF-8 encoded input.
The pattern must be a valid ECMAScript regular expression.
func MustCompile ¶
MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables containing regular expressions.
func (*RegExp) FindMatch ¶
FindMatch applies r to a UTF-8 encoded byte slice and returns the first match. If no match is found, it returns nil.
func (*RegExp) FindMatchStartingAt ¶
FindMatchStartingAt applies r to a UTF-8 encoded byte slice beginning the search at pos, where pos is a byte index into source. It returns the first match found at or after pos. If pos is out of range or no match is found, it returns nil.
func (*RegExp) FindNextMatch ¶
FindNextMatch searches for the next match of r in the same UTF-8 encoded input as a previously returned match.
The search begins at match.Groups[0].End. If the previous match was zero-length (Start == End), the search position is advanced by one input position before matching again to avoid returning the same empty match repeatedly.
If match is nil, or if no further match is found, FindNextMatch returns nil.
type RegExpUtf16 ¶
type RegExpUtf16 struct {
// contains filtered or unexported fields
}
RegExpUtf16 represents a compiled regular expression. It is safe for concurrent use by multiple goroutines. All methods on RegExpUtf16 do not mutate internal state.
func CompileUtf16 ¶
func CompileUtf16(pattern []uint16, flags Flag) (*RegExpUtf16, error)
CompileUtf16 parses a regular expression pattern expressed as UTF-16 code units and returns a RegExpUtf16 that can be applied against UTF-16 encoded input.
The pattern must be a valid ECMAScript regular expression.
func MustCompileUtf16 ¶
func MustCompileUtf16(pattern []uint16, flags Flag) *RegExpUtf16
MustCompileUtf16 is like CompileUtf16 but panics if the expression cannot be parsed. It simplifies safe initialization of global variables containing regular expressions.
func (*RegExpUtf16) FindMatch ¶
func (r *RegExpUtf16) FindMatch(source []uint16) *MatchUtf16
FindMatch applies r to a UTF-16 slice and returns the first match. If no match is found, it returns nil.
func (*RegExpUtf16) FindMatchStartingAt ¶
func (r *RegExpUtf16) FindMatchStartingAt(source []uint16, pos int) *MatchUtf16
FindMatchStartingAt applies r to a UTF-16 slice beginning the search at pos, where pos is an index in UTF-16 code units. It returns the first match found at or after pos. If pos is out of range or no match is found, it returns nil.
func (*RegExpUtf16) FindMatchStartingAtSticky ¶
func (r *RegExpUtf16) FindMatchStartingAtSticky(source []uint16, pos int) *MatchUtf16
FindMatchStartingAtSticky applies r to a UTF-16 slice requiring the match to start exactly at pos (sticky behavior), where pos is an index in UTF-16 code units. If the input at pos does not begin a match, or if pos is out of range, it returns nil.
This method is particularly useful for JavaScript engine implementers. The ECMAScript specification defines RegExp.prototype[Symbol.split] to create a new RegExp with the "y" (sticky) flag in order to constrain matching to the current position. By calling FindMatchStartingAtSticky instead, it is possible to avoid the overhead of allocating and compiling a new RegExp object, while still honoring the sticky semantics.
func (*RegExpUtf16) FindNextMatch ¶
func (r *RegExpUtf16) FindNextMatch(match *MatchUtf16) *MatchUtf16
FindNextMatch searches for the next match of r in the same UTF-16 encoded input as a previously returned match.
The search begins at match.Groups[0].End. If the previous match was zero-length (Start == End), the search position is advanced by one input position before matching again to avoid returning the same empty match repeatedly.
If match is nil, or if no further match is found, FindNextMatch returns nil.