MySQL Bugs: #101556: CP1252 is not Latin-1

Bug #101556	CP1252 is not Latin-1
Submitted:	11 Nov 2020 8:53	Modified:	11 Nov 2020 13:18
Reporter:	Marc Masó	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	CP1252, encoding, Latin1, tildes

Description:
Latin-1 is not equal to Latin-1 also know as ISO-8859-1.

This suppose a big problem with SQL libraries like python that try to decode the strings to real latin-1 causing in the best scenery losing of information (wrong character encoding like 'open accents' (e.g à, è...), simple quotation symbol, etc...  

Also users get a wrong sense of the encoding they are using.

How to repeat:
- Add the ISO-8859-1 encoding
- change the name of CP1252 encoding to it's real name

Hi Mr. maso,

Thank you for your bug report.

However, we can not change the existing names of the character sets, as this would ruin millions of applications. We also can not change definition of the character sets, for the same reason. So many millions of applications are depending on the charsets / collations remaining as they are.

Also, we do not understand what does it mean "Latin-1 is not Latin-1".  If you wish to add ISO-8859-1 character set, please provide the justification and the exact definition. In that case, it could be considered as a feature request.

We are waiting on your feedback.

Sorry, a typo. What I meant to say is: 
"Latin-1 is not equal to CP1252, also know as ISO-8859-1."

The big problem is in the range 0x80-0x9F. ISO-8859-1 (latin-1) encodes control codes but CP1252 encodes tildes and other important characters in languages like catalan or french.

It would be really good to add this encoding.
here is a more detail comparison between ISO-8859-1 and CP-1252: https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html

Hi Mr. maso,

We have analysed your request and we think that this is a feature that makes sense to be implemented. Hence, we verify that ISO-8859-1 should be implemented as a separate character set.