| Bug #94203 | REGEXP_REPLACE adds zero-bytes | ||
|---|---|---|---|
| Submitted: | 5 Feb 2019 4:31 | Modified: | 25 Apr 2019 5:40 |
| Reporter: | Rick James | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: DML | Severity: | S3 (Non-critical) |
| Version: | 8.0.15 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | REGEXP_REPLACE | ||
[5 Feb 2019 5:55]
Rick James
Could it be that the intent is to convert to CHARACTER SET ucs2, but not convert back?
[5 Feb 2019 6:10]
MySQL Verification Team
Hello Rick James, Thank you for the report and feedback. regards, Umesh
[5 Feb 2019 21:22]
Ujjwal Singh
For now the workaround is: an additional REPLACE of CHAR(0) with ''
[24 Apr 2019 22:01]
Jon Stephens
Documented fix as follows in the MySQL 8.0.17 changelog:
The results returned by the functions REGEXP_REPLACE() and
REGEXP_SUBSTR() used UTF-16 rather than the character set of the
expression searched for matches.
Also noted the change in the descriptions of these functions in the Manual.
Closed.
[25 Apr 2019 5:40]
Rick James
Are these two functions the only string functions that deliberately use a CHARACTER SET other than what the connection is using?

Description: Notice the spurius NULs: mysql> SELECT REGEXP_REPLACE('abcd', 'c', '') = 'abd'; +-----------------------------------------+ | REGEXP_REPLACE('abcd', 'c', '') = 'abd' | +-----------------------------------------+ | 1 | +-----------------------------------------+ mysql> SELECT HEX(REGEXP_REPLACE('abcd', 'c', '')), HEX('abd'); +--------------------------------------+------------+ | HEX(REGEXP_REPLACE('abcd', 'c', '')) | HEX('abd') | +--------------------------------------+------------+ | 610062006400 | 616264 | +--------------------------------------+------------+ mysql> SELECT LENGTH(REGEXP_REPLACE('abcd', 'c', '')), LENGTH('abd'); +-----------------------------------------+---------------+ | LENGTH(REGEXP_REPLACE('abcd', 'c', '')) | LENGTH('abd') | +-----------------------------------------+---------------+ | 6 | 3 | +-----------------------------------------+---------------+ mysql> SELECT @@version; +-----------+ | @@version | +-----------+ | 8.0.15 | +-----------+ How to repeat: SELECT REGEXP_REPLACE('abcd', 'c', '') = 'abd'; -- Expect 1, get 1 SELECT HEX(REGEXP_REPLACE('abcd', 'c', '')), HEX('abd'); -- Expect same hex, but no SELECT LENGTH(REGEXP_REPLACE('abcd', 'c', '')), LENGTH('abd'); -- Expect 3 and 3; get 6 and 3