Bug #101556 CP1252 is not Latin-1
Submitted: 11 Nov 2020 8:53 Modified: 11 Nov 2020 13:18
Reporter: Marc Masó Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: CP1252, encoding, Latin1, tildes

[11 Nov 2020 8:53] Marc Masó
Description:
Latin-1 is not equal to Latin-1 also know as ISO-8859-1.

This suppose a big problem with SQL libraries like python that try to decode the strings to real latin-1 causing in the best scenery losing of information (wrong character encoding like 'open accents' (e.g à, è...), simple quotation symbol, etc...  

Also users get a wrong sense of the encoding they are using.

How to repeat:
- Add the ISO-8859-1 encoding
- change the name of CP1252 encoding to it's real name
[11 Nov 2020 12:45] MySQL Verification Team
Hi Mr. maso,

Thank you for your bug report.

However, we can not change the existing names of the character sets, as this would ruin millions of applications. We also can not change definition of the character sets, for the same reason. So many millions of applications are depending on the charsets / collations remaining as they are.

Also, we do not understand what does it mean "Latin-1 is not Latin-1".  If you wish to add ISO-8859-1 character set, please provide the justification and the exact definition. In that case, it could be considered as a feature request.

We are waiting on your feedback.
[11 Nov 2020 12:57] Marc Masó
Sorry, a typo. What I meant to say is: 
"Latin-1 is not equal to CP1252, also know as ISO-8859-1."

The big problem is in the range 0x80-0x9F. ISO-8859-1 (latin-1) encodes control codes but CP1252 encodes tildes and other important characters in languages like catalan or french.

It would be really good to add this encoding.
here is a more detail comparison between ISO-8859-1 and CP-1252: https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
[11 Nov 2020 13:18] MySQL Verification Team
Hi Mr. maso,

We have analysed your request and we think that this is a feature that makes sense to be implemented. Hence, we verify that ISO-8859-1 should be implemented as a separate character set.