| Bug #77473 | Truncated data with subquery & UTF8 | ||
|---|---|---|---|
| Submitted: | 24 Jun 2015 12:58 | Modified: | 20 Jul 2015 14:58 |
| Reporter: | T. M. | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.5.39, 5.6.25 | OS: | Windows (x64) |
| Assigned to: | Tor Didriksen | CPU Architecture: | Any |
| Tags: | Subquery UTF8 truncated | ||
[24 Jun 2015 16:01]
Hartmut Holzgraefe
select length(data) as len from ( select rpad('a', 32767, 'é') as data ) as Sub;
returns the correct result 65533, starting with a rpad argument of 32768 up to 65535 the result is 65535, then with a parameter of >=65536 the result is correct again
looks as if a too small two-byte integer type is used for the subquery result so that the result value is truncatd by integer overflow ...
Also reproduced on MySQL 5.6.14 and 5.7.4
[25 Jun 2015 7:08]
MySQL Verification Team
Hello! Thank you for the report and test case. Thanks, Umesh
[26 Jun 2015 14:11]
Hartmut Holzgraefe
The problem is not restricted to checking the length in the outer query, it also occurs when just passing through the inner result ... and it's not only RPAD() that is affected, I get the same with REPEAT():
$ mysql -Ne "set names utf8; select data from (select rpad('', 10000, 'é') as data ) as Sub;" | wc
1 1 20001
$ mysql -Ne "set names utf8; select data from (select rpad('', 40000, 'é') as data ) as Sub;" | wc
1 1 65535
$ mysql -Ne "set names utf8; select data from (select rpad('', 70000, 'é') as data ) as Sub;" | wc
1 1 140001
$ mysql -Ne "set names utf8; select data from (select repeat('é', 40000) as data ) as Sub;" | wc
1 1 65535
[30 Jun 2015 19:23]
Hartmut Holzgraefe
Truncated results happen even earlier when using a character with 3-byte UTF-8 sequence:
select length(data) as len from ( select repeat('☃', ...) as data ) as Sub;
parameter -> result
21844 -> 65532
21845 -> 65535
21846 -> 65535
...
65535 -> 65535
65536 -> 196608
[20 Jul 2015 14:58]
Paul DuBois
Noted in 5.7.9, 5.8.0 changelogs. For some string functions, data was truncated when evaluated in subqueries due to incorrect space calculations when creating temporary tables to hold intermediate results.

Description: Hi, We are facing a problem with the following queries using utf8. Data is truncated with utf8 charset with subquery. I think there is a misleading between length and character length in Mysql code. How to repeat: SET NAMES utf8; SET character_set_results=NULL; -- this works fine select length(rpad('a', 65536, 'é')) as data; -- >> 131071 -- Same query with sub query works fine select length(data) as len from ( select rpad('a', 65536, 'é') as data ) as Sub; -- >> 131071 -- now with a smaller size -- This works fine select length(rpad('a', 65535, 'é')) as data; -- >> 131069 -- Same query with subquery truncates the data select length(data) as len from ( select rpad('a', 65535, 'é') as data ) as Sub; -- >> 65535