Lists: | pgsql-hackers |
---|
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | change regexp_substr first argument make tests more easier to understand. |
Date: | 2023-12-27 16:13:14 |
Message-ID: | CACJufxH4gw3Ej-90xjDz_7Xk=XfynRqpz6UyS0oQkCWQqY=d=g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
SELECT regexp_substr('abcabcabc', 'a.c');
SELECT regexp_substr('abcabcabc', 'a.c', 2);
SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
they all return 'abc', there are 3 'abc ' in string 'abcabcabc'
except IS NULL query.
maybe we can change regexp_substr first argument from "abcabcabc" to
"abcaXcaYc".
so the result would be more easier to understand.
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: change regexp_substr first argument make tests more easier to understand. |
Date: | 2023-12-28 07:17:00 |
Message-ID: | CACJufxGiE0tuWioYgMvRCtd8J4RhqRK+pXCQQEEZkWdqRmwKJw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Dec 28, 2023 at 12:13 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> hi.
> https://git.postgresql.org/cgit/postgresql.git/tree/src/test/regress/expected/strings.out#n928
>
> SELECT regexp_substr('abcabcabc', 'a.c');
> SELECT regexp_substr('abcabcabc', 'a.c', 2);
> SELECT regexp_substr('abcabcabc', 'a.c', 1, 3);
> SELECT regexp_substr('abcabcabc', 'a.c', 1, 4) IS NULL AS t;
> SELECT regexp_substr('abcabcabc', 'A.C', 1, 2, 'i');
>
> they all return 'abc', there are 3 'abc ' in string 'abcabcabc'
> except IS NULL query.
> maybe we can change regexp_substr first argument from "abcabcabc" to
> "abcaXcaYc".
> so the result would be more easier to understand.
anyway here is the minor patch to change string from "abcabcabc" to
"abcaXcaYc".
Attachment | Content-Type | Size |
---|---|---|
v1-0001-make-regex-result-more-easier-to-understand.patch | text/x-patch | 2.8 KB |
From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: change regexp_substr first argument make tests more easier to understand. |
Date: | 2024-07-18 21:49:16 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
If I understand correctly, the problem is that it's not clear which of
the 'abc' substrings is matched/returned by the function, right?
I wonder if this is a problem only for understanding the test, or if it
makes the tests a bit weaker. I mean, what if the function returns the
wrong substring? How would we know?
Also, if we tweak this, shouldn't we tweak also the regext_instr() calls
a bit earlier in the test script?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | jian he <jian(dot)universality(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: change regexp_substr first argument make tests more easier to understand. |
Date: | 2024-07-26 00:00:00 |
Message-ID: | CACJufxFFfd057tTSEN=-Sq4K-NYVDDkkgp6EfQ+CeuyKRYAPqA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Jul 19, 2024 at 5:49 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Hi,
>
> If I understand correctly, the problem is that it's not clear which of
> the 'abc' substrings is matched/returned by the function, right?
>
> I wonder if this is a problem only for understanding the test, or if it
> makes the tests a bit weaker. I mean, what if the function returns the
> wrong substring? How would we know?
>
this is for understanding the test.
personally, sometimes, I feel the documentation is too dry, hard to follow.
so i can based on regress tests better understand the documentation.
that was my intention for the changes.
we have more sophisticated regex test at
https://git.postgresql.org/cgit/postgresql.git/tree/src/test/modules/test_regex
> Also, if we tweak this, shouldn't we tweak also the regext_instr() calls
> a bit earlier in the test script?
>
sure.
please check attached.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-refactor-regex-related-tests.patch | text/x-patch | 5.0 KB |
From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: change regexp_substr first argument make tests more easier to understand. |
Date: | 2024-08-13 09:40:00 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi everybody
Current tests with regexp_instr() and regexp_substr() with string
'abcabcabc' are really unreadable and you would spend time to understand
that happens in these tests and if they are really correct. I'd better
change them into "abcdefghi" just like in query
SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
Regards
Ilia Evdokimov,
Tantor Labs LLC.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: change regexp_substr first argument make tests more easier to understand. |
Date: | 2024-09-05 17:45:29 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> writes:
> Current tests with regexp_instr() and regexp_substr() with string
> 'abcabcabc' are really unreadable and you would spend time to understand
> that happens in these tests and if they are really correct. I'd better
> change them into "abcdefghi" just like in query
> SELECT regexp_substr('abcdefghi', 'd.q') IS NULL AS t;
On looking more closely at these test cases, I think the point of them
is exactly to show the behavior of the functions with multiple copies
of the target substring. Thus, what Jian is proposing breaks the
tests: it's no longer perfectly clear whether the result is because
the function did what we expect, or because the pattern failed to
match anywhere else. (Sure, "a.c" *should* match "aXc", but if it
didn't, you wouldn't discover that from this test.) What Ilia
proposes would break them worse.
I think we should just reject this patch, or at least reject the
parts of it that change existing test cases. I have no opinion
about whether the new test cases add anything useful.
regards, tom lane