Bug #118222 Add Arabic collation with Alef normalization (ا, أ, إ, آ) and case-insensitive support for accurate matching
Submitted: 18 May 11:41 Modified: 19 May 6:31
Reporter: Mostafa Rabia Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:8.0, 8.4, 9.3 OS:Any
Assigned to: CPU Architecture:Any

[18 May 11:41] Mostafa Rabia
Description:
Currently, MySQL lacks a collation that properly supports Arabic case-insensitive and diacritic-insensitive comparison, especially with Arabic letter "Alef" variants such as:

- ا (U+0627)
- أ (U+0623)
- إ (U+0625)
- آ (U+0622)

These characters are considered different in existing collations such as `utf8mb4_general_ci` or `utf8mb4_unicode_ci`, resulting in incorrect query results and unexpected behavior for native Arabic speakers. For example:

    SELECT * FROM users WHERE name = 'احمد';

Would fail to match:
- "أحمد"
- "إحمد"
- "آحمد"

### Feature Request:

Introduce a new collation like `utf8mb4_arabic_ai_ci` or extend the current Arabic collations to:

- Normalize all Alef variants to a base form.
- Optionally ignore diacritics (tashkeel).
- Support case-insensitivity.

This would bring MySQL closer to true Arabic linguistic handling, improve search relevance, and fix common user frustration in Arabic applications.

### References:

- Full article with technical explanation and proposed solution:  
https://ahmadessamdev.medium.com/arabic-case-insensitive-in-database-systems-how-to-solve-...

Thank you!

How to repeat:
-- Create table
CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci
);

-- Insert sample Arabic names with Alef variants
INSERT INTO users (name) VALUES
  ('احمد'),  -- Alef
  ('أحمد'),  -- Alef with Hamza above
  ('إحمد'),  -- Alef with Hamza below
  ('آحمد');  -- Alef with Madda

-- Now try to search using plain Alef:
SELECT * FROM users WHERE name = 'احمد';

❌ Expected:
The query should return all 4 rows, treating all Alef variants as equal.

❌ Actual:
It only returns the exact match 'احمد'.

🎯 Why this matters
Arabic users expect that searching for "احمد" should match "أحمد", "إحمد", and "آحمد", which all represent the same logical name in Arabic. Current collations treat these as completely different letters.
[19 May 6:31] MySQL Verification Team
Hello Mostafa Rabia,

Thank you for the feature request!

regards,
Umesh