Skip to content

Conversation

@emilypi
Copy link
Member

@emilypi emilypi commented Apr 24, 2020

This removes the odd inconsistency between failure modes for URL where degenerate inputs like ZE= pass decode, but not decodeUnpadded and decodePadded. Addresses #35

before:

П> U.decode "ZE="
Right "d"
П> U.decodeUnpadded  "ZE="
Left "Base64-encoded bytestring required to be unpadded"
П> U.decodePadded  "ZE="
Left "Base64-encoded bytestring required to be padded"

after (note the subtlety in the messages returned):

П> U.decode "ZE="
Left "Base64-encoded bytestring has invalid padding"
П> U.decodeUnpadded  "ZE="
Left "Base64-encoded bytestring required to be unpadded"
П> U.decodePadded  "ZE="
Left "Base64-encoded bytestring has invalid padding"
П> U.decode "ZE=="
Right "d"
П> U.decodeUnpadded  "ZE=="
Left "Base64-encoded bytestring required to be unpadded"
П> U.decodePadded  "ZE=="
Right "d"
П> U.decode "ZE"
Right "d"
П> U.decodeUnpadded  "ZE"
Right "d"
П> U.decodePadded  "ZE"
Left "Base64-encoded bytestring required to be padded"

TODO:

  • property tests for interplay between decodes

@emilypi emilypi changed the title Add check for degenerate padded case in decode [WIP] Add check for degenerate padded case in decode Apr 24, 2020
@emilypi emilypi changed the title [WIP] Add check for degenerate padded case in decode Add check for degenerate padded case in decode May 14, 2020
@emilypi emilypi requested review from 23Skidoo and hvr May 28, 2020 19:59
add better padding validation + note about validation strategy
@emilypi emilypi force-pushed the emily/check-decode-padding branch from d2402e1 to 5d36588 Compare May 28, 2020 22:31
@emilypi
Copy link
Member Author

emilypi commented May 29, 2020

@hvr @23Skidoo i think i've settled on an error msg reporting scheme I'm okay with. Could you guys give a review?

| otherwise -> err "Base64-encoded bytestring has invalid size"
| r == 0 -> validateLastPad bs noPad $ go bs
| r == 2 -> validateLastPad bs noPad $ go (B.append bs (B.replicate 2 0x3d))
| r == 3 -> validateLastPad bs noPad $ go (B.append bs (B.replicate 1 0x3d))
Copy link
Member Author

@emilypi emilypi May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all written down in the $Validation note, but just to recap:

Let bs be a bytestring of length l. Then the following properties hold:

  • l == 0 mod 4: The input bytestring is assumed to be well-formed. This will always be the expected case for padded Base64 and Base64url values, or for unpadded Base64url values that happen to have a pre-encoded length multiple of 6. In any case, these will go through the standard decode routine, and any existing padding chars will be validated in the final quanta (see: finalChunk).

  • l == 1 mod 4: This is never a valid length for Base64 or Base64url-encoded values. The specification requires that the unpadded length of the encoded string be l == 0 mod 4, l == 2 mod 4, or l == 3 mod 4. There will never be a valid unpadded input of length l == 1 mod 4 as a result. This can be rejected outright.

  • l == 2 mod 4: In this case, two padding chars must appear in the final quanta. If any additional padding chars exist in the string, then they will fail as final quanta, as we require the final four bytes (say, (a b '=' '=')) to have that a /= '=' and b /= '='. Additional pads will fail that clause of finalChunk. Thus, it's safe to add 2 padding chars to the end of a supposedly unpadded input of length l == 2 mod 4, since the addition will never form a well-formed input if the unpadded string is already malformed.

  • l == 3 mod 4: This is the only tricky case. When inputs have this length, then we expect that that adding padding chars will result in the form (a b c '='). However, if the unpadded input has '=' in the c position, it is possible that adding padding chars to the string "completes" the input in the sense that it forms a valid input where the unpadded fragment can be seen as a bytestring of length l == 2 mod 4. This could potentially be an attack vector, and constitutes a security risk. Thankfully, this is also easy to check, since, we only need to validate that the last char of an unpadded bytestring of length l == 3 mod 4 is not '='. If any additional padding chars are present, then there is no risk that they will contribute to a well-formed input, since they will fail as final quanta in the a and b positions. So really, the requirement with padding bytestrings of length l == 3 mod 4 is that they are of the form (a b c '='), c /= '=' after padding.

@emilypi
Copy link
Member Author

emilypi commented Jun 4, 2020

@23Skidoo and @hvr thoughts?

Copy link
Member

@23Skidoo 23Skidoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, changes in this PR look good.

@emilypi
Copy link
Member Author

emilypi commented Jun 5, 2020

Thanks @23Skidoo, merging.

@emilypi emilypi merged commit 593ed26 into master Jun 5, 2020
@emilypi emilypi deleted the emily/check-decode-padding branch June 5, 2020 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants