Skip to content

Commit 2054757

Browse files
committedSep 5, 2019
WIP
1 parent fc0b185 commit 2054757

File tree

12 files changed

+371
-181
lines changed

12 files changed

+371
-181
lines changed
 

‎9-regular-expressions/02-regexp-character-classes/article.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Most used are:
4141
: A digit: a character from `0` to `9`.
4242

4343
`pattern:\s` ("s" is from "space")
44-
: A space symbol: includes spaces, tabs `\t`, newlines `\n` and few other rare characters: `\v`, `\f` and `\r`.
44+
: A space symbol: includes spaces, tabs `\t`, newlines `\n` and few other rare characters, such as `\v`, `\f` and `\r`.
4545

4646
`pattern:\w` ("w" is from "word")
4747
: A "wordly" character: either a letter of Latin alphabet or a digit or an underscore `_`. Non-Latin letters (like cyrillic or hindi) do not belong to `pattern:\w`.

‎9-regular-expressions/08-regexp-character-sets-and-ranges/article.md

Lines changed: 102 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ So the example below gives no matches:
2222
alert( "Voila".match(/V[oi]la/) ); // null, no matches
2323
```
2424

25-
The pattern assumes:
25+
The pattern searches for:
2626

2727
- `pattern:V`,
2828
- then *one* of the letters `pattern:[oi]`,
@@ -42,23 +42,56 @@ In the example below we're searching for `"x"` followed by two digits or letters
4242
alert( "Exception 0xAF".match(/x[0-9A-F][0-9A-F]/g) ); // xAF
4343
```
4444

45-
Please note that in the word `subject:Exception` there's a substring `subject:xce`. It didn't match the pattern, because the letters are lowercase, while in the set `pattern:[0-9A-F]` they are uppercase.
45+
Here `pattern:[0-9A-F]` has two ranges: it searches for a character that is either a digit from `0` to `9` or a letter from `A` to `F`.
4646

47-
If we want to find it too, then we can add a range `a-f`: `pattern:[0-9A-Fa-f]`. The `pattern:i` flag would allow lowercase too.
47+
If we'd like to look for lowercase letters as well, we can add the range `a-f`: `pattern:[0-9A-Fa-f]`. Or add the flag `pattern:i`.
4848

49-
**Character classes are shorthands for certain character sets.**
49+
We can also use character classes inside `[…]`.
5050

51+
For instance, if we'd like to look for a wordly character `pattern:\w` or a hyphen `pattern:-`, then the set is `pattern:[\w-]`.
52+
53+
Combining multiple classes is also possible, e.g. `pattern:[\s\d]` means "a space character or a digit".
54+
55+
```smart header="Character classes are shorthands for certain character sets"
5156
For instance:
5257
5358
- **\d** -- is the same as `pattern:[0-9]`,
5459
- **\w** -- is the same as `pattern:[a-zA-Z0-9_]`,
55-
- **\s** -- is the same as `pattern:[\t\n\v\f\r ]` plus few other unicode space characters.
60+
- **\s** -- is the same as `pattern:[\t\n\v\f\r ]`, plus few other rare unicode space characters.
61+
```
62+
63+
### Example: multi-language \w
64+
65+
As the character class `pattern:\w` is a shorthand for `pattern:[a-zA-Z0-9_]`, it can't find Chinese hieroglyphs, Cyrillic letters, etc.
66+
67+
We can write a more universal pattern, that looks for wordly characters in any language. That's easy with unicode properties: `pattern:[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]`.
5668

57-
We can use character classes inside `[…]` as well.
69+
Let's decipher it. Similar to `pattern:\w`, we're making a set of our own that includes characters with following unicode properties:
5870

59-
For instance, we want to match all wordly characters or a dash, for words like "twenty-third". We can't do it with `pattern:\w+`, because `pattern:\w` class does not include a dash. But we can use `pattern:[\w-]`.
71+
- `Alphabetic` (`Alpha`) - for letters,
72+
- `Mark` (`M`) - for accents,
73+
- `Decimal_Number` (`Nd`) - for digits,
74+
- `Connector_Punctuation` (`Pc`) - for the underscore `'_'` and similar characters,
75+
- `Join_Control` (`Join_C`) - two special codes `200c` and `200d`, used in ligatures, e.g. in Arabic.
6076

61-
We also can use several classes, for example `pattern:[\s\S]` matches spaces or non-spaces -- any character. That's wider than a dot `"."`, because the dot matches any character except a newline (unless `pattern:s` flag is set).
77+
An example of use:
78+
79+
```js run
80+
let regexp = /[\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_C}]/gu;
81+
82+
let str = `Hi 你好 12`;
83+
84+
// finds all letters and digits:
85+
alert( str.match(regexp) ); // H,i,你,好,1,2
86+
```
87+
88+
Of course, we can edit this pattern: add unicode properties or remove them. Unicode properties are covered in more details in the article <info:regexp-unicode>.
89+
90+
```warn header="Unicode properties aren't supported in Edge and Firefox"
91+
Unicode properties `pattern:p{…}` are not yet implemented in Edge and Firefox. If we really need them, we can use library [XRegExp](http://xregexp.com/).
92+
93+
Or just use ranges of characters in a language that interests us, e.g. `pattern:[а-я]` for Cyrillic letters.
94+
```
6295

6396
## Excluding ranges
6497

@@ -78,22 +111,20 @@ The example below looks for any characters except letters, digits and spaces:
78111
alert( "alice15@gmail.com".match(/[^\d\sA-Z]/gi) ); // @ and .
79112
```
80113

81-
## No escaping in []
114+
## Escaping in []
82115

83-
Usually when we want to find exactly the dot character, we need to escape it like `pattern:\.`. And if we need a backslash, then we use `pattern:\\`.
116+
Usually when we want to find exactly a special character, we need to escape it like `pattern:\.`. And if we need a backslash, then we use `pattern:\\`, and so on.
84117

85-
In square brackets the vast majority of special characters can be used without escaping:
118+
In square brackets we can use the vast majority of special characters without escaping:
86119

87-
- A dot `pattern:'.'`.
88-
- A plus `pattern:'+'`.
89-
- Parentheses `pattern:'( )'`.
90-
- Dash `pattern:'-'` in the beginning or the end (where it does not define a range).
91-
- A caret `pattern:'^'` if not in the beginning (where it means exclusion).
92-
- And the opening square bracket `pattern:'['`.
120+
- Symbols `pattern:. + ( )` never need escaping.
121+
- A hyphen `pattern:-` is not escaped in the beginning or the end (where it does not define a range).
122+
- A caret `pattern:^` is only escaped in the beginning (where it means exclusion).
123+
- The closing square bracket `pattern:]` is always escaped (if we need to look for that symbol).
93124

94-
In other words, all special characters are allowed except where they mean something for square brackets.
125+
In other words, all special characters are allowed without escaping, except when they mean something for square brackets.
95126

96-
A dot `"."` inside square brackets means just a dot. The pattern `pattern:[.,]` would look for one of characters: either a dot or a comma.
127+
A dot `.` inside square brackets means just a dot. The pattern `pattern:[.,]` would look for one of characters: either a dot or a comma.
97128

98129
In the example below the regexp `pattern:[-().^+]` looks for one of the characters `-().^+`:
99130

@@ -112,3 +143,55 @@ let reg = /[\-\(\)\.\^\+]/g;
112143

113144
alert( "1 + 2 - 3".match(reg) ); // also works: +, -
114145
```
146+
147+
## Ranges and flag "u"
148+
149+
If there are surrogate pairs in the set, flag `pattern:u` is required for them to work correctly.
150+
151+
For instance, let's look for `pattern:[𝒳𝒴]` in the string `subject:𝒳`:
152+
153+
```js run
154+
alert( '𝒳'.match(/[𝒳𝒴]/) ); // shows a strange character, like [?]
155+
// (the search was performed incorrectly, half-character returned)
156+
```
157+
158+
The result is incorrect, because by default regular expressions "don't know" about surrogate pairs.
159+
160+
The regular expression engine thinks that `[𝒳𝒴]` -- are not two, but four characters:
161+
1. left half of `𝒳` `(1)`,
162+
2. right half of `𝒳` `(2)`,
163+
3. left half of `𝒴` `(3)`,
164+
4. right half of `𝒴` `(4)`.
165+
166+
We can see their codes like this:
167+
168+
```js run
169+
for(let i=0; i<'𝒳𝒴'.length; i++) {
170+
alert('𝒳𝒴'.charCodeAt(i)); // 55349, 56499, 55349, 56500
171+
};
172+
```
173+
174+
So, the example above finds and shows the left half of `𝒳`.
175+
176+
If we add flag `pattern:u`, then the behavior will be correct:
177+
178+
```js run
179+
alert( '𝒳'.match(/[𝒳𝒴]/u) ); // 𝒳
180+
```
181+
182+
The similar situation occurs when looking for a range, such as `[𝒳-𝒴]`.
183+
184+
If we forget to add flag `pattern:u`, there will be an error:
185+
186+
```js run
187+
'𝒳'.match(/[𝒳-𝒴]/); // Error: Invalid regular expression
188+
```
189+
190+
The reason is that without flag `pattern:u` surrogate pairs are perceived as two characters, so `[𝒳-𝒴]` is interpreted as `[<55349><56499>-<55349><56500>]` (every surrogate pair is replaced with its codes). Now it's easy to see that the range `56499-55349` is invalid: its starting code `56499` is greater than the end `55349`. That's the formal reason for the error.
191+
192+
With the flag `pattern:u` the pattern works correctly:
193+
194+
```js run
195+
// look for characters from 𝒳 to 𝒵
196+
alert( '𝒴'.match(/[𝒳-𝒵]/u) ); // 𝒴
197+
```

‎9-regular-expressions/09-regexp-quantifiers/article.md

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Let's say we have a string like `+7(903)-123-45-67` and want to find all numbers in it. But unlike before, we are interested not in single digits, but full numbers: `7, 903, 123, 45, 67`.
44

5-
A number is a sequence of 1 or more digits `pattern:\d`. To mark how many we need, we need to append a *quantifier*.
5+
A number is a sequence of 1 or more digits `pattern:\d`. To mark how many we need, we can append a *quantifier*.
66

77
## Quantity {n}
88

@@ -12,7 +12,7 @@ A quantifier is appended to a character (or a character class, or a `[...]` set
1212

1313
It has a few advanced forms, let's see examples:
1414

15-
The exact count: `{5}`
15+
The exact count: `pattern:{5}`
1616
: `pattern:\d{5}` denotes exactly 5 digits, the same as `pattern:\d\d\d\d\d`.
1717

1818
The example below looks for a 5-digit number:
@@ -23,7 +23,7 @@ The exact count: `{5}`
2323

2424
We can add `\b` to exclude longer numbers: `pattern:\b\d{5}\b`.
2525

26-
The range: `{3,5}`, match 3-5 times
26+
The range: `pattern:{3,5}`, match 3-5 times
2727
: To find numbers from 3 to 5 digits we can put the limits into curly braces: `pattern:\d{3,5}`
2828

2929
```js run
@@ -54,8 +54,8 @@ alert(numbers); // 7,903,123,45,67
5454

5555
There are shorthands for most used quantifiers:
5656

57-
`+`
58-
: Means "one or more", the same as `{1,}`.
57+
`pattern:+`
58+
: Means "one or more", the same as `pattern:{1,}`.
5959

6060
For instance, `pattern:\d+` looks for numbers:
6161

@@ -65,8 +65,8 @@ There are shorthands for most used quantifiers:
6565
alert( str.match(/\d+/g) ); // 7,903,123,45,67
6666
```
6767

68-
`?`
69-
: Means "zero or one", the same as `{0,1}`. In other words, it makes the symbol optional.
68+
`pattern:?`
69+
: Means "zero or one", the same as `pattern:{0,1}`. In other words, it makes the symbol optional.
7070

7171
For instance, the pattern `pattern:ou?r` looks for `match:o` followed by zero or one `match:u`, and then `match:r`.
7272

@@ -78,16 +78,16 @@ There are shorthands for most used quantifiers:
7878
alert( str.match(/colou?r/g) ); // color, colour
7979
```
8080

81-
`*`
82-
: Means "zero or more", the same as `{0,}`. That is, the character may repeat any times or be absent.
81+
`pattern:*`
82+
: Means "zero or more", the same as `pattern:{0,}`. That is, the character may repeat any times or be absent.
8383

84-
For example, `pattern:\d0*` looks for a digit followed by any number of zeroes:
84+
For example, `pattern:\d0*` looks for a digit followed by any number of zeroes (may be many or none):
8585

8686
```js run
8787
alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1
8888
```
8989

90-
Compare it with `'+'` (one or more):
90+
Compare it with `pattern:+` (one or more):
9191

9292
```js run
9393
alert( "100 10 1".match(/\d0+/g) ); // 100, 10
@@ -98,43 +98,45 @@ There are shorthands for most used quantifiers:
9898

9999
Quantifiers are used very often. They serve as the main "building block" of complex regular expressions, so let's see more examples.
100100

101-
Regexp "decimal fraction" (a number with a floating point): `pattern:\d+\.\d+`
102-
: In action:
103-
```js run
104-
alert( "0 1 12.345 7890".match(/\d+\.\d+/g) ); // 12.345
105-
```
101+
**Regexp for decimal fractions (a number with a floating point): `pattern:\d+\.\d+`**
106102

107-
Regexp "open HTML-tag without attributes", like `<span>` or `<p>`: `pattern:/<[a-z]+>/i`
108-
: In action:
103+
In action:
104+
```js run
105+
alert( "0 1 12.345 7890".match(/\d+\.\d+/g) ); // 12.345
106+
```
107+
108+
**Regexp for an "opening HTML-tag without attributes", such as `<span>` or `<p>`.**
109+
110+
1. The simplest one: `pattern:/<[a-z]+>/i`
109111

110112
```js run
111113
alert( "<body> ... </body>".match(/<[a-z]+>/gi) ); // <body>
112114
```
113115

114-
We look for character `pattern:'<'` followed by one or more Latin letters, and then `pattern:'>'`.
116+
The regexp looks for character `pattern:'<'` followed by one or more Latin letters, and then `pattern:'>'`.
117+
118+
2. Improved: `pattern:/<[a-z][a-z0-9]*>/i`
115119

116-
Regexp "open HTML-tag without attributes" (improved): `pattern:/<[a-z][a-z0-9]*>/i`
117-
: Better regexp: according to the standard, HTML tag name may have a digit at any position except the first one, like `<h1>`.
120+
According to the standard, HTML tag name may have a digit at any position except the first one, like `<h1>`.
118121

119122
```js run
120123
alert( "<h1>Hi!</h1>".match(/<[a-z][a-z0-9]*>/gi) ); // <h1>
121124
```
122125

123-
Regexp "opening or closing HTML-tag without attributes": `pattern:/<\/?[a-z][a-z0-9]*>/i`
124-
: We added an optional slash `pattern:/?` before the tag. Had to escape it with a backslash, otherwise JavaScript would think it is the pattern end.
126+
**Regexp "opening or closing HTML-tag without attributes": `pattern:/<\/?[a-z][a-z0-9]*>/i`**
125127

126-
```js run
127-
alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>
128-
```
128+
We added an optional slash `pattern:/?` near the beginning of the pattern. Had to escape it with a backslash, otherwise JavaScript would think it is the pattern end.
129+
130+
```js run
131+
alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>
132+
```
129133

130134
```smart header="To make a regexp more precise, we often need make it more complex"
131135
We can see one common rule in these examples: the more precise is the regular expression -- the longer and more complex it is.
132136
133-
For instance, for HTML tags we could use a simpler regexp: `pattern:<\w+>`.
134-
135-
...But because `pattern:\w` means any Latin letter or a digit or `'_'`, the regexp also matches non-tags, for instance `match:<_>`. So it's much simpler than `pattern:<[a-z][a-z0-9]*>`, but less reliable.
137+
For instance, for HTML tags we could use a simpler regexp: `pattern:<\w+>`. But as HTML has stricter restrictions for a tag name, `pattern:<[a-z][a-z0-9]*>` is more reliable.
136138
137-
Are we ok with `pattern:<\w+>` or we need `pattern:<[a-z][a-z0-9]*>`?
139+
Can we use `pattern:<\w+>` or we need `pattern:<[a-z][a-z0-9]*>`?
138140
139-
In real life both variants are acceptable. Depends on how tolerant we can be to "extra" matches and whether it's difficult or not to filter them out by other means.
141+
In real life both variants are acceptable. Depends on how tolerant we can be to "extra" matches and whether it's difficult or not to remove them from the result by other means.
140142
```

‎9-regular-expressions/10-regexp-greedy-and-lazy/3-find-html-comments/solution.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,11 @@
11
We need to find the beginning of the comment `match:<!--`, then everything till the end of `match:-->`.
22

3-
The first idea could be `pattern:<!--.*?-->` -- the lazy quantifier makes the dot stop right before `match:-->`.
3+
An acceptable variant is `pattern:<!--.*?-->` -- the lazy quantifier makes the dot stop right before `match:-->`. We also need to add flag `pattern:s` for the dot to include newlines.
44

5-
But a dot in JavaScript means "any symbol except the newline". So multiline comments won't be found.
6-
7-
We can use `pattern:[\s\S]` instead of the dot to match "anything":
5+
Otherwise multiline comments won't be found:
86

97
```js run
10-
let reg = /<!--[\s\S]*?-->/g;
8+
let reg = /<!--.*?-->/gs;
119

1210
let str = `... <!-- My -- comment
1311
test --> .. <!----> ..

‎9-regular-expressions/10-regexp-greedy-and-lazy/article.md

Lines changed: 21 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Let's take the following task as an example.
88

99
We have a text and need to replace all quotes `"..."` with guillemet marks: `«...»`. They are preferred for typography in many countries.
1010

11-
For instance: `"Hello, world"` should become `«Hello, world»`. Some countries prefer other quotes, like `„Witam, świat!”` (Polish) or `「你好,世界」` (Chinese), but for our task let's choose `«...»`.
11+
For instance: `"Hello, world"` should become `«Hello, world»`. There exist other quotes, such as `„Witam, świat!”` (Polish) or `「你好,世界」` (Chinese), but for our task let's choose `«...»`.
1212

1313
The first thing to do is to locate quoted strings, and then we can replace them.
1414

@@ -35,7 +35,7 @@ That can be described as "greediness is the cause of all evil".
3535
To find a match, the regular expression engine uses the following algorithm:
3636

3737
- For every position in the string
38-
- Match the pattern at that position.
38+
- Try to match the pattern at that position.
3939
- If there's no match, go to the next position.
4040

4141
These common words do not make it obvious why the regexp fails, so let's elaborate how the search works for the pattern `pattern:".+"`.
@@ -44,7 +44,7 @@ These common words do not make it obvious why the regexp fails, so let's elabora
4444

4545
The regular expression engine tries to find it at the zero position of the source string `subject:a "witch" and her "broom" is one`, but there's `subject:a` there, so there's immediately no match.
4646

47-
Then it advances: goes to the next positions in the source string and tries to find the first character of the pattern there, and finally finds the quote at the 3rd position:
47+
Then it advances: goes to the next positions in the source string and tries to find the first character of the pattern there, fails again, and finally finds the quote at the 3rd position:
4848

4949
![](witch_greedy1.svg)
5050

@@ -54,23 +54,23 @@ These common words do not make it obvious why the regexp fails, so let's elabora
5454

5555
![](witch_greedy2.svg)
5656

57-
3. Then the dot repeats because of the quantifier `pattern:.+`. The regular expression engine builds the match by taking characters one by one while it is possible.
57+
3. Then the dot repeats because of the quantifier `pattern:.+`. The regular expression engine adds to the match one character after another.
5858

59-
...When does it become impossible? All characters match the dot, so it only stops when it reaches the end of the string:
59+
...Until when? All characters match the dot, so it only stops when it reaches the end of the string:
6060

6161
![](witch_greedy3.svg)
6262

63-
4. Now the engine finished repeating for `pattern:.+` and tries to find the next character of the pattern. It's the quote `pattern:"`. But there's a problem: the string has finished, there are no more characters!
63+
4. Now the engine finished repeating `pattern:.+` and tries to find the next character of the pattern. It's the quote `pattern:"`. But there's a problem: the string has finished, there are no more characters!
6464

6565
The regular expression engine understands that it took too many `pattern:.+` and starts to *backtrack*.
6666

6767
In other words, it shortens the match for the quantifier by one character:
6868

6969
![](witch_greedy4.svg)
7070

71-
Now it assumes that `pattern:.+` ends one character before the end and tries to match the rest of the pattern from that position.
71+
Now it assumes that `pattern:.+` ends one character before the string end and tries to match the rest of the pattern from that position.
7272

73-
If there were a quote there, then that would be the end, but the last character is `subject:'e'`, so there's no match.
73+
If there were a quote there, then the search would end, but the last character is `subject:'e'`, so there's no match.
7474

7575
5. ...So the engine decreases the number of repetitions of `pattern:.+` by one more character:
7676

@@ -84,19 +84,19 @@ These common words do not make it obvious why the regexp fails, so let's elabora
8484

8585
7. The match is complete.
8686

87-
8. So the first match is `match:"witch" and her "broom"`. The further search starts where the first match ends, but there are no more quotes in the rest of the string `subject:is one`, so no more results.
87+
8. So the first match is `match:"witch" and her "broom"`. If the regular expression has flag `pattern:g`, then the search will continue from where the first match ends. There are no more quotes in the rest of the string `subject:is one`, so no more results.
8888

8989
That's probably not what we expected, but that's how it works.
9090

91-
**In the greedy mode (by default) the quantifier is repeated as many times as possible.**
91+
**In the greedy mode (by default) a quantifier is repeated as many times as possible.**
9292

93-
The regexp engine tries to fetch as many characters as it can by `pattern:.+`, and then shortens that one by one.
93+
The regexp engine adds to the match as many characters as it can for `pattern:.+`, and then shortens that one by one, if the rest of the pattern doesn't match.
9494

95-
For our task we want another thing. That's what the lazy quantifier mode is for.
95+
For our task we want another thing. That's where a lazy mode can help.
9696

9797
## Lazy mode
9898

99-
The lazy mode of quantifier is an opposite to the greedy mode. It means: "repeat minimal number of times".
99+
The lazy mode of quantifiers is an opposite to the greedy mode. It means: "repeat minimal number of times".
100100

101101
We can enable it by putting a question mark `pattern:'?'` after the quantifier, so that it becomes `pattern:*?` or `pattern:+?` or even `pattern:??` for `pattern:'?'`.
102102

@@ -149,20 +149,19 @@ Other quantifiers remain greedy.
149149
For instance:
150150

151151
```js run
152-
alert( "123 456".match(/\d+ \d+?/g) ); // 123 4
152+
alert( "123 456".match(/\d+ \d+?/) ); // 123 4
153153
```
154154

155-
1. The pattern `pattern:\d+` tries to match as many numbers as it can (greedy mode), so it finds `match:123` and stops, because the next character is a space `pattern:' '`.
156-
2. Then there's a space in pattern, it matches.
155+
1. The pattern `pattern:\d+` tries to match as many digits as it can (greedy mode), so it finds `match:123` and stops, because the next character is a space `pattern:' '`.
156+
2. Then there's a space in the pattern, it matches.
157157
3. Then there's `pattern:\d+?`. The quantifier is in lazy mode, so it finds one digit `match:4` and tries to check if the rest of the pattern matches from there.
158158

159159
...But there's nothing in the pattern after `pattern:\d+?`.
160160

161161
The lazy mode doesn't repeat anything without a need. The pattern finished, so we're done. We have a match `match:123 4`.
162-
4. The next search starts from the character `5`.
163162

164163
```smart header="Optimizations"
165-
Modern regular expression engines can optimize internal algorithms to work faster. So they may work a bit different from the described algorithm.
164+
Modern regular expression engines can optimize internal algorithms to work faster. So they may work a bit differently from the described algorithm.
166165
167166
But to understand how regular expressions work and to build regular expressions, we don't need to know about that. They are only used internally to optimize things.
168167
@@ -264,7 +263,7 @@ That's what's going on:
264263
2. Then it looks for `pattern:.*?`: takes one character (lazily!), check if there's a match for `pattern:" class="doc">` (none).
265264
3. Then takes another character into `pattern:.*?`, and so on... until it finally reaches `match:" class="doc">`.
266265

267-
But the problem is: that's already beyond the link, in another tag `<p>`. Not what we want.
266+
But the problem is: that's already beyond the link `<a...>`, in another tag `<p>`. Not what we want.
268267

269268
Here's the picture of the match aligned with the text:
270269

@@ -273,11 +272,9 @@ Here's the picture of the match aligned with the text:
273272
<a href="link1" class="wrong">... <p style="" class="doc">
274273
```
275274

276-
So the laziness did not work for us here.
275+
So, we need the pattern to look for `<a href="...something..." class="doc">`, but both greedy and lazy variants have problems.
277276

278-
We need the pattern to look for `<a href="...something..." class="doc">`, but both greedy and lazy variants have problems.
279-
280-
The correct variant would be: `pattern:href="[^"]*"`. It will take all characters inside the `href` attribute till the nearest quote, just what we need.
277+
The correct variant can be: `pattern:href="[^"]*"`. It will take all characters inside the `href` attribute till the nearest quote, just what we need.
281278

282279
A working example:
283280

@@ -301,4 +298,4 @@ Greedy
301298
Lazy
302299
: Enabled by the question mark `pattern:?` after the quantifier. The regexp engine tries to match the rest of the pattern before each repetition of the quantifier.
303300

304-
As we've seen, the lazy mode is not a "panacea" from the greedy search. An alternative is a "fine-tuned" greedy search, with exclusions. Soon we'll see more examples of it.
301+
As we've seen, the lazy mode is not a "panacea" from the greedy search. An alternative is a "fine-tuned" greedy search, with exclusions, as in the pattern `pattern:"[^"]+"`.

‎9-regular-expressions/11-regexp-groups/1-find-webcolor-3-or-6/solution.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
A regexp to search 3-digit color `#abc`: `pattern:/#[a-f0-9]{3}/i`.
22

3-
We can add exactly 3 more optional hex digits. We don't need more or less. Either we have them or we don't.
3+
We can add exactly 3 more optional hex digits. We don't need more or less. The color has either 3 or 6 digits.
44

5-
The simplest way to add them -- is to append to the regexp: `pattern:/#[a-f0-9]{3}([a-f0-9]{3})?/i`
5+
Let's use the quantifier `pattern:{1,2}` for that: we'll have `pattern:/#([a-f0-9]{3}){1,2}/i`.
66

7-
We can do it in a smarter way though: `pattern:/#([a-f0-9]{3}){1,2}/i`.
8-
9-
Here the regexp `pattern:[a-f0-9]{3}` is in parentheses to apply the quantifier `pattern:{1,2}` to it as a whole.
7+
Here the pattern `pattern:[a-f0-9]{3}` is enclosed in parentheses to apply the quantifier `pattern:{1,2}`.
108

119
In action:
1210

‎9-regular-expressions/11-regexp-groups/1-find-webcolor-3-or-6/task.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@ let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
1111
alert( str.match(reg) ); // #3f3 #AA00ef
1212
```
1313

14-
P.S. This should be exactly 3 or 6 hex digits: values like `#abcd` should not match.
14+
P.S. This should be exactly 3 or 6 hex digits. Values with 4 digits, such as `#abcd`, should not match.

‎9-regular-expressions/11-regexp-groups/2-find-decimal-numbers/solution.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
A positive number with an optional decimal part is (per previous task): `pattern:\d+(\.\d+)?`.
22

3-
Let's add an optional `-` in the beginning:
3+
Let's add the optional `pattern:-` in the beginning:
44

55
```js run
66
let reg = /-?\d+(\.\d+)?/g;

‎9-regular-expressions/11-regexp-groups/5-parse-expression/solution.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
A regexp for a number is: `pattern:-?\d+(\.\d+)?`. We created it in previous tasks.
22

3-
An operator is `pattern:[-+*/]`.
3+
An operator is `pattern:[-+*/]`. The hyphen `pattern:-` goes first in the square brackets, because in the middle it would mean a character range, while we just want a character `-`.
44

5-
Please note:
6-
- Here the dash `pattern:-` goes first in the brackets, because in the middle it would mean a character range, while we just want a character `-`.
7-
- A slash `/` should be escaped inside a JavaScript regexp `pattern:/.../`, we'll do that later.
5+
The slash `/` should be escaped inside a JavaScript regexp `pattern:/.../`, we'll do that later.
86

97
We need a number, an operator, and then another number. And optional spaces between them.
108

119
The full regular expression: `pattern:-?\d+(\.\d+)?\s*[-+*/]\s*-?\d+(\.\d+)?`.
1210

13-
To get a result as an array let's put parentheses around the data that we need: numbers and the operator: `pattern:(-?\d+(\.\d+)?)\s*([-+*/])\s*(-?\d+(\.\d+)?)`.
11+
It has 3 parts, with `pattern:\s*` between them:
12+
1. `pattern:-?\d+(\.\d+)?` - the first number,
13+
1. `pattern:[-+*/]` - the operator,
14+
1. `pattern:-?\d+(\.\d+)?` - the second number.
15+
16+
To make each of these parts a separate element of the result array, let's enclose them in parentheses: `pattern:(-?\d+(\.\d+)?)\s*([-+*/])\s*(-?\d+(\.\d+)?)`.
1417

1518
In action:
1619

@@ -29,11 +32,11 @@ The result includes:
2932
- `result[4] == "12"` (forth group `(-?\d+(\.\d+)?)` -- the second number)
3033
- `result[5] == undefined` (fifth group `(\.\d+)?` -- the last decimal part is absent, so it's undefined)
3134

32-
We only want the numbers and the operator, without the full match or the decimal parts.
35+
We only want the numbers and the operator, without the full match or the decimal parts, so let's "clean" the result a bit.
3336

34-
The full match (the arrays first item) can be removed by shifting the array `pattern:result.shift()`.
37+
The full match (the arrays first item) can be removed by shifting the array `result.shift()`.
3538

36-
The decimal groups can be removed by making them into non-capturing groups, by adding `pattern:?:` to the beginning: `pattern:(?:\.\d+)?`.
39+
Groups that contain decimal parts (number 2 and 4) `pattern:(.\d+)` can be excluded by adding `pattern:?:` to the beginning: `pattern:(?:\.\d+)?`.
3740

3841
The final solution:
3942

‎9-regular-expressions/11-regexp-groups/article.md

Lines changed: 194 additions & 86 deletions
Large diffs are not rendered by default.
Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)
Please sign in to comment.