Skip to content

Problems with cherokee characters & toCaseFold #277

Closed
@felixonmars

Description

@felixonmars

Ref haskellari/binary-instances#7:

Cherokee letters should fold to upper case, but now the don't converge:

Prelude Data.Char Data.Text> toCaseFold "\43929"
"\5065"
Prelude Data.Char Data.Text> toCaseFold "\5065"
"\43929"

The docs say:

toCaseFold :: Text -> Text
O(n) Convert a string to folded case. Subject to fusion.

This function is mainly useful for performing caseless (also known as case insensitive) string comparisons.

A string x is a caseless match for a string y if and only if:

toCaseFold x == toCaseFold y

https://unicode.org/faq/casemap_charprop.html
Says

Q: What happens if the uppercase letter is the one that is already encoded?
A: That situation is more complicated. When the existing encoded letter is an uppercase letter and the proposal is to encode a new lowercase letter case pair for it, that is normally disallowed. The case folding for the existing uppercase letter would change, and that is blocked by the requirement for case folding stability. In exceptional situations, if a lowercase letter must be added, it would need to be case-folded to the existing uppercase letter, rather than changing the case folding for that existing letter. Such an exceptional situation did, in fact, apply for the addition of Cherokee lowercase syllables in Version 8.0. Cherokee case folding rules were specified to map to the old uppercase syllables, to preserve case folding stability for them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionRequires more investigation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions