Bug #50915 Romansh language patch for MySQL server
Submitted: 4 Feb 2010 14:24 Modified: 10 Aug 2010 14:17
Reporter: Beat Vontobel (Silver Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: General Severity:S4 (Feature request)
Version:any OS:Any
Assigned to: Alexander Barkov
Tags: Contribution, date, i8n, language, locale, qc, Romansh, strings, Switzerland
Triage: Needs Triage: D5 (Feature request)

[4 Feb 2010 14:24] Beat Vontobel
Description:
This patch adds Romansh language support to MySQL's date/time functions.

Romansh is one of the four official (legal) languages in Switzerland (besides German, French and Italian). The ISO 639-1 code is "rm".

The translations included in the patch were double checked by Gion-Andri Cantieni who (as a native speaker) also worked on the Romansh localization of Mozilla Firefox (as I am not a native speaker). I'd be glad if you could add this to the official source tree, as websites in Switzerland using MySQL internal functions for localization of date/time strings currently only support three of the four languages (or have to stop using MySQL's internal date/time string functions and move everything up to a programming language like PHP -- or patch the server themselves).

Just to stress the importance of the language again :) -- not only Firefox but also Microsoft Office and Google Search exist in localized versions for Romansh.

How to repeat:
mysql1.intern-(none) [admin] > SET lc_time_names = 'de_CH';
Query OK, 0 rows affected (0.01 sec)

mysql1.intern-(none) [admin] > SET lc_time_names = 'fr_CH';
Query OK, 0 rows affected (0.00 sec)

mysql1.intern-(none) [admin] > SET lc_time_names = 'it_CH';
Query OK, 0 rows affected (0.00 sec)

mysql1.intern-(none) [admin] > SET lc_time_names = 'rm_CH';
ERROR 1105 (HY000): Unknown locale: 'rm_CH'

Suggested fix:
--- sql_locale.cc.orig	2009-03-31 16:39:52.000000000 +0200
+++ sql_locale.cc	2009-11-13 20:47:54.000000000 +0100
@@ -1847,6 +1847,38 @@
);
/***** LOCALE END zh_TW *****/

+/***** LOCALE BEGIN rm_CH: Romansh - Switzerland *****/
+static const char *my_locale_month_names_rm_CH[13] =
+ {"schaner","favrer","mars","avrigl","matg","zercladur","fanadur","avust","settember","october","november","december", NullS };
+static const char *my_locale_ab_month_names_rm_CH[13] =
+ {"schan","favr","mars","avr","matg","zercl","fan","avust","sett","oct","nov","dec", NullS };
+static const char *my_locale_day_names_rm_CH[8] =
+ {"glindesdi","mardi","mesemna","gievgia","venderdi","sonda","dumengia", NullS };
+static const char *my_locale_ab_day_names_rm_CH[8] =
+ {"gli","ma","me","gie","ve","so","du", NullS };
+static TYPELIB my_locale_typelib_month_names_rm_CH =
+ { array_elements(my_locale_month_names_rm_CH)-1, "", my_locale_month_names_rm_CH, NULL };
+static TYPELIB my_locale_typelib_ab_month_names_rm_CH =
+ { array_elements(my_locale_ab_month_names_rm_CH)-1, "", my_locale_ab_month_names_rm_CH, NULL };
+static TYPELIB my_locale_typelib_day_names_rm_CH =
+ { array_elements(my_locale_day_names_rm_CH)-1, "", my_locale_day_names_rm_CH, NULL };
+static TYPELIB my_locale_typelib_ab_day_names_rm_CH =
+ { array_elements(my_locale_ab_day_names_rm_CH)-1, "", my_locale_ab_day_names_rm_CH, NULL };
+MY_LOCALE my_locale_rm_CH
+(
+  109,
+  "rm_CH",
+  "Romansh - Switzerland",
+  FALSE,
+  &my_locale_typelib_month_names_rm_CH,
+  &my_locale_typelib_ab_month_names_rm_CH,
+  &my_locale_typelib_day_names_rm_CH,
+  &my_locale_typelib_ab_day_names_rm_CH,
+  9,
+  9
+);
+/***** LOCALE END rm_CH *****/
+
/***** LOCALE BEGIN ar_DZ: Arabic - Algeria *****/
MY_LOCALE my_locale_ar_DZ
(
@@ -2797,6 +2829,7 @@
    &my_locale_no_NO,
    &my_locale_sv_FI,
    &my_locale_zh_HK,
+    &my_locale_rm_CH,
    NULL
  };
[4 Feb 2010 14:40] Beat Vontobel
Patch on mysql-5.1.43

Attachment: mysql_romansh.patch (application/octet-stream, text), 1.91 KiB.

[4 Feb 2010 15:06] Lenz Grimmer
Thanks a lot, Beat! We appreciate it.

For the reviewer: SCA has been signed, everything from the administrative side of things is in place.
[4 Feb 2010 15:47] Valerii Kravchuk
Thank you for the patch contributed.
[4 Feb 2010 16:14] Beat Vontobel
Test cases to maybe speed things up -- and make Giuseppe happy :)

Attachment: romansh-locale-tests.tar.gz (application/x-gzip, text), 781 bytes.

[4 Feb 2010 20:35] Beat Vontobel
Patch to existing variables.test/variables.result

Attachment: mysql_romansh_variables_test.patch (application/octet-stream, text), 1.27 KiB.

[4 Feb 2010 20:41] Beat Vontobel
I just added an additional patch for one of the existing tests: t/variables.test and r/variables.result, respectively. This test explicitly checked the "last" LC_TIME locale, which was zh_HK (108) before and is now rm_CH (109), of course. With this patch, the default build (./configure && make && cd mysql-test && ./mysql-test-run) tests without a failure again.
[7 Feb 2010 17:27] Peter Gulutzan
The month and day names correspond to what anybody
would expect for Grischun, and I see that the same
abbreviations also are suggested for Ruby on Rails
http://svn.openstreetmap.org/sites/rails_port/vendor/plugins/rails-i18n/locale/rm.yml
Nevertheless I wonder whether the abbreviations are
due to some official Swiss standard, or not.
We once had confusion due to a contribution for
another locale, when it turned out that the
suggested abbreviations were uncertain.
[8 Feb 2010 9:47] Beat Vontobel
Peter, the short answer is: Yes, the abbreviations used in this patch are those suggested by the "Pledari Grond" for "Rumantsch Grischun", as maintained by the "Lia Rumantscha" (http://www.liarumantscha.ch/). "Rumantsch Grischun" is what's now officially used on the national and state (Canton of Graubuenden) level in writing. The vocabulary can be accessed online at http://www.pledarigrond.ch/ - enter the full German or Romansh words for month/weekday names and you'll also get the Romansh abbreviations for your own verification.

The long answer would probably exceed the space of this text box and thus prove that you indeed asked the right question. While preparing this tiny patch, I spent more time on this single question than on all the other steps combined (including getting the SCA from Sun-now-Oracle, and that took some time…). I specifically tried to find a standard for fixed-length abbreviation strings, as that's what's used in the locales for most other languages. Unfortunately there's none: If we want to follow any standard at all, it's the strings now in the patch.
[8 Feb 2010 19:45] Peter Gulutzan
Seeing Beat Vontobel's explanation, I realize the abbreviations are okay.
[23 Mar 2010 8:13] Lenz Grimmer
A related WorkLog entry was created for this: WL#5303
Alexander Barkov will incorporate this patch.
[23 Mar 2010 8:18] Alexander Barkov
http://forge.mysql.com/worklog/task.php?id=5303
[23 Mar 2010 19:09] Peter Gulutzan
Beat, consider this en_US example:

"
mysql> CREATE TABLE t (s1 DATETIME);
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO t VALUES ('2010-03-23 11:00:00');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO t VALUES ('2010-03-23 13:00:00');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT DATE_FORMAT(s1,'%h %p') FROM t;
+-------------------------+
| DATE_FORMAT(s1,'%h %p') |
+-------------------------+
| 11 AM                   |
| 01 PM                   |
+-------------------------+
2 rows in set (0.00 sec)
"

We need  Romansh for 'AM' and 'PM'.
I have guessed 'AM' (avantmezdi) 'SM' (suentermezdi).

But we need something better than a guess, if possible.
[23 Mar 2010 21:03] Beat Vontobel
Hi Peter, the answer would probably be "it doesn't really exist, but it's also AM/PM", as for German.

It's just that in written text, always 24h notation is used for times and never 12h (only in whole sentences, usually when minutes are left out, you might find something like "um 5 Uhr nachmittags", but then it's always with an adverb, not suitable here).

As the abbrevations stand for Latin "ante/post meridiem", they are also valid in these languages, as for German:

mysql1.intern-(none) [admin] > SET @@lc_time_names='de_DE';Query OK, 0 rows affected (0.00 sec)

mysql1.intern-(none) [admin] > SELECT DATE_FORMAT(s1,'%h %p') FROM (SELECT NOW() AS s1 UNION SELECT NOW() - INTERVAL 12 HOUR) AS t;
+-------------------------+
| DATE_FORMAT(s1,'%h %p') |
+-------------------------+
| 08 PM                   |
| 08 AM                   |
+-------------------------+
2 rows in set (0.00 sec)

So, I'm 99% sure that would also be the best solution for Romansh (just keep AM/PM, do not try to translate it), but just to get the 100% affirmation, I double checked with "my experts and native speakers". Hope to get a reply by tomorrow and will post it here.
[23 Mar 2010 22:36] Beat Vontobel
Yep, got a reply already, we should stick with AM/PM.
[24 Mar 2010 10:18] Alexander Barkov
Beat, thanks for confirmation!

Can you please also verify that number formatting
should work similar to de_CH and it_CH, not like en_US:

mysql> select format(1234545, 2, 'de_CH') as de_CH,format(1234545, 2, 'it_CH') as it_CH,format(1234545, 2, 'en_US') as en_US;
+--------------+--------------+--------------+
| de_CH        | it_CH        | en_US        |
+--------------+--------------+--------------+
| 1'234'545.00 | 1'234'545,00 | 1,234,545.00 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)

Thanks!
[24 Mar 2010 14:14] Beat Vontobel
You're absolutely right, it should definitely work "like" de_CH or it_CH and definitely not like en_US. The short answer is: What you have for de_CH should also be used for rm_CH (and it_CH and fr_CH!), i.e. 1'234'545.00

Now, the complicated answer is, we probably also have to file a "bug" report for it_CH (or de_CH, or fr_CH or all of them...) -- I learn a lot by just doing research for this tiny Romansh patch. :) Actually the number format for *_CH (de, fr, it, rm) would be:

  1 234 545.00 for monetary values
  1 234 545,00 for everything else

As the non-breaking space is no solution for some character sets and FORMAT() is probably most of the time used for financial values, I'd stick to the above mentioned 1'234'545.00 for all of *_CH.
 
I try to translate/paraphrase/combine the three standards documents for German, Italian and French from the Swiss Federal Chancellery (http://www.bk.admin.ch/dokumentation/sprachen/) for you in a more RFC-like "syntax" (unfortunately they do not have a document for Romansh, but this should also be valid, "we all" write numbers the same way, usually, in Switzerland):

"A comma (,) MUST be used to separate decimal fractions, unless it's a financial (monetary) value, then a point (.) MUST be used. Thousands (digits in groups of three) SHOULD be separated by a non-breaking space (NBSP), the apostrophe (') SHOULD NOT be used to separate thousands. Comma or point (,/.) MUST NOT be used to separate digit groups."

I have copied the relevant sections from the documents below in the original languages for reference. The above also corresponds to what I see in most documents in Switzerland, even if the French version doesn't e.g. explicitly mention the point (.) instead of comma (,) for monetary values, that's what's used everywhere. The documents agree on not using apostrophes but non-breaking spaces for group separation, but there we probably have to follow the technical restrictions.

de_CH:
"Dezimalstellen werden durch das Dezimalkomma abgetrennt. Bei Geldbeträgen ist zwischen der Währungseinheit und der Untereinheit anstelle des Dezimalkommas der Dezimalpunkt zu setzen. [...] Ziffern werden in Dreiergruppen zusammengefasst. Besteht eine Zahl aus vier Ziffern, so wird die erste nicht abgesetzt, sondern eine Vierergruppe gebildet. Zahlen, die aus mehr als vier Ziffern bestehen, werden von der Endziffer aus in Dreiergruppen zerlegt [...]. Zur Gliederung wird ein Festabstand verwendet, damit die Zahlengruppen beim Zeilensprung nicht auseinandergerissen werden. Die früher gebräuchliche Schreibung mit Apostroph sollte nicht mehr angewendet werden [...]. Nicht korrekt ist 
die im angelsächsischen Raum übliche Gliederung mit Punkten."

it_CH:
"I numeri frazionari sono scritti in cifre con la virgola. [...] Le frazioni di unità monetarie sono invece precedute da un punto. [..] I numeri con più di quattro cifre si scrivono unendo le cifre a gruppi di tre partendo dalla cifra finale, senza segni di separazione tra un gruppo e l’altro (inserendo però 
uno spazio protetto [...])"

fr_CH:
"Dans les nombres comprenant plus de quatre chiffres, on sépare par un espace insécable chaque tranche de trois chiffres à partir de la droite. [...] les décimales sont séparées par une virgule."
[25 Mar 2010 5:57] Alexander Barkov
Hi Beat,

thank you very much for these details.

Note, we've taken locale data from Posix.
You can find it in these files on a Linux machine:
/usr/share/i18n/locales/it_CH and /usr/share/i18n/locales/de_CH.

So there's possibly a mistake in Posix locale data.

We'll check this issue separately. In the meanwhile we have everything
to add Romansh locale.

Thanks!
[25 Mar 2010 6:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/104279
[21 Jun 2010 6:12] Bugs System
Pushed into mysql-next-mr (revid:bar@mysql.com-20100617072236-vx5eqygof70izuho) (version source revid:bar@mysql.com-20100617072236-vx5eqygof70izuho) (pib:16)
[4 Aug 2010 8:09] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804080001-bny5271e65xo34ig) (version source revid:bar@mysql.com-20100617072236-vx5eqygof70izuho) (merge vers: 5.6.99-m4) (pib:18)
[4 Aug 2010 8:24] Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@ibmvm-20100804081533-c1d3rbipo9e8rt1s) (version source revid:bar@mysql.com-20100617072236-vx5eqygof70izuho) (merge vers: 5.6.99-m4) (pib:18)
[10 Aug 2010 14:17] Paul Dubois
Noted in 5.6.0 changelog.

The Romansh locale 'rm_CH' is now a permissible value for the
lc_time_names system variable.