Error in calculating length of encoded base64 string

Lists: pgsql-hackers
From: o(dot)tselebrovskiy(at)postgrespro(dot)ru
To: pgsql-hackers(at)postgresql(dot)org
Cc: Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
Subject: Error in calculating length of encoded base64 string
Date: 2023-06-08 07:53:28
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Greetings, everyone!

While working on an extension I've found an error in how length of
encoded base64 string is calulated;

This error is present in 3 files across all supported versions:

/src/common/base64.c, function pg_b64_enc_len;
/src/backend/utils/adt/encode.c, function pg_base64_enc_len;
/contrib/pgcrypto/pgp-armor.c, function pg_base64_enc_len (copied from
encode.c).

In all three cases the length is calculated as follows:

(srclen + 2) * 4 / 3; (plus linefeed in latter two cases)

There's also a comment /* 3 bytes will be converted to 4 */

This formula is wrong. Let's calculate encoded length for different
starting lengths:

starting length 2: (2 + 2) * 4 / 3 = 5,
starting length 3: (3 + 2) * 4 / 3 = 6,
starting length 4: (4 + 2) * 4 / 3 = 8,
starting length 6: (6 + 2) * 4 / 3 = 10,
starting length 10: (10 + 2) * 4 / 3 = 16,

when it should be 4, 4, 8, 8, 16.

So the suggestion is to change the formula to a right one: (srclen + 2)
/ 3 * 4;

The patch is attached.

Oleg Tselebrovskiy, Postgres Pro

Attachment Content-Type Size
base64_encoded_length.patch text/x-diff 1.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: o(dot)tselebrovskiy(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)postgresql(dot)org, Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
Subject: Re: Error in calculating length of encoded base64 string
Date: 2023-06-08 14:35:26
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

o(dot)tselebrovskiy(at)postgrespro(dot)ru writes:
> While working on an extension I've found an error in how length of
> encoded base64 string is calulated;

Yeah, I think you're right. It's not of huge significance, because
it just overestimates by 1 or 2 bytes, but we might as well get
it right. Thanks for the report and patch!

regards, tom lane


From: Gurjeet Singh <gurjeet(at)singh(dot)im>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: o(dot)tselebrovskiy(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org, Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
Subject: Re: Error in calculating length of encoded base64 string
Date: 2023-06-09 06:10:03
Message-ID: CABwTF4X2qMDBeoB3TX_6oVJHfA6OAwrL553a2ow_X21SRNeSfg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> o(dot)tselebrovskiy(at)postgrespro(dot)ru writes:
> > While working on an extension I've found an error in how length of
> > encoded base64 string is calulated;
>
> Yeah, I think you're right. It's not of huge significance, because
> it just overestimates by 1 or 2 bytes, but we might as well get
> it right. Thanks for the report and patch!

From your commit d98ed080bb

> This bug is very ancient, dating to commit 79d78bb26 which
> added encode.c. (The other instances were presumably copied
> from there.) Still, it doesn't quite seem worth back-patching.

Is it worth investing time in trying to unify these 3 occurrences of
base64 length (and possibly other relevant) code to one place? If yes,
I can volunteer for it.

The common code facility under src/common/ did not exist back when
pgcrypto was added, but since it does now, it may be worth it make
others depend on implementation in src/common/ code.

Best regards,
Gurjeet
http://Gurje.et


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gurjeet Singh <gurjeet(at)singh(dot)im>
Cc: o(dot)tselebrovskiy(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org, Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
Subject: Re: Error in calculating length of encoded base64 string
Date: 2023-06-09 06:13:45
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Gurjeet Singh <gurjeet(at)singh(dot)im> writes:
> On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> This bug is very ancient, dating to commit 79d78bb26 which
>> added encode.c. (The other instances were presumably copied
>> from there.) Still, it doesn't quite seem worth back-patching.

> Is it worth investing time in trying to unify these 3 occurrences of
> base64 length (and possibly other relevant) code to one place? If yes,
> I can volunteer for it.

I wondered about that too. It seems really silly that we made
a copy in src/common and did not replace the others with calls
to that.

regards, tom lane


From: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gurjeet Singh <gurjeet(at)singh(dot)im>, o(dot)tselebrovskiy(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org, Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
Subject: Re: Error in calculating length of encoded base64 string
Date: 2023-06-09 10:26:38
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gurjeet Singh <gurjeet(at)singh(dot)im> writes:
>> On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> This bug is very ancient, dating to commit 79d78bb26 which
>>> added encode.c. (The other instances were presumably copied
>>> from there.) Still, it doesn't quite seem worth back-patching.
>
>> Is it worth investing time in trying to unify these 3 occurrences of
>> base64 length (and possibly other relevant) code to one place? If yes,
>> I can volunteer for it.
>
> I wondered about that too. It seems really silly that we made
> a copy in src/common and did not replace the others with calls
> to that.

Also, while we're at it, how about some unit tests that both encode and
calculate the encoded length of strings of various lengths and check
that they match?

- ilmari


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gurjeet Singh <gurjeet(at)singh(dot)im>, o(dot)tselebrovskiy(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org, Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>
Subject: Re: Error in calculating length of encoded base64 string
Date: 2023-08-26 17:43:31
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 2023-Jun-09, Tom Lane wrote:

> Gurjeet Singh <gurjeet(at)singh(dot)im> writes:
> > On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> This bug is very ancient, dating to commit 79d78bb26 which
> >> added encode.c. (The other instances were presumably copied
> >> from there.) Still, it doesn't quite seem worth back-patching.
>
> > Is it worth investing time in trying to unify these 3 occurrences of
> > base64 length (and possibly other relevant) code to one place? If yes,
> > I can volunteer for it.
>
> I wondered about that too. It seems really silly that we made
> a copy in src/common and did not replace the others with calls
> to that.

I looked into this. It turns out that there is a difference in newline
handling in the other routines compared to what was added for SCRAM,
which doesn't have any (and complains if you supply them). Peter E
did suggest to unify them at the time:
https://www.postgresql.org/message-id/947b9aff-8fdb-dbf5-a99c-0ffd4523a73f%402ndquadrant.com

We could add a boolean "whitespace" flag to both of
src/common/base64.c's pg_b64_encode() and pg_b64_decode(); with that I
think it could serve the three places that need it.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/