Bug #77473 Truncated data with subquery & UTF8
Submitted: 24 Jun 2015 12:58 Modified: 20 Jul 2015 14:58
Reporter: T. M. Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5.39, 5.6.25 OS:Windows (x64)
Assigned to: Tor Didriksen CPU Architecture:Any
Tags: Subquery UTF8 truncated

[24 Jun 2015 12:58] T. M.
Description:
Hi,

We are facing a problem with the following queries using utf8.

Data is truncated with utf8 charset with subquery.

I think there is a misleading between length and character length in Mysql code.

How to repeat:
SET NAMES utf8;
SET character_set_results=NULL;

-- this works fine
select length(rpad('a', 65536, 'é')) as data;
-- >> 131071

-- Same query with sub query works fine
select length(data) as len from (
  select rpad('a', 65536, 'é') as data
) as Sub;
-- >> 131071

-- now with a smaller size
-- This works fine
select length(rpad('a', 65535, 'é')) as data;
-- >> 131069
-- Same query with subquery truncates the data
select length(data) as len from (
  select rpad('a', 65535, 'é') as data
) as Sub;
-- >> 65535
[24 Jun 2015 16:01] Hartmut Holzgraefe
select length(data) as len from ( select rpad('a', 32767, 'é') as data ) as Sub;

returns the correct result 65533, starting with a rpad argument of 32768 up to 65535 the result is 65535, then with a parameter of >=65536 the result is correct again 

looks as if a too small two-byte integer type is used for the subquery result so that the result value is truncatd by integer overflow ...

Also reproduced on MySQL 5.6.14 and 5.7.4
[25 Jun 2015 7:08] MySQL Verification Team
Hello!

Thank you for the report and test case.

Thanks,
Umesh
[26 Jun 2015 14:11] Hartmut Holzgraefe
The problem is not restricted to checking the length in the outer query, it also occurs when just passing through the inner result ... and it's not only RPAD() that is affected, I get the same with REPEAT():

$ mysql -Ne "set names utf8; select data from (select rpad('', 10000, 'é') as data ) as Sub;" | wc
      1       1   20001

$ mysql -Ne "set names utf8; select data from (select rpad('', 40000, 'é') as data ) as Sub;" | wc
      1       1   65535

$ mysql -Ne "set names utf8; select data from (select rpad('', 70000, 'é') as data ) as Sub;" | wc
      1       1  140001

$ mysql -Ne "set names utf8; select data from (select repeat('é', 40000) as data ) as Sub;" | wc
      1       1   65535
[30 Jun 2015 19:23] Hartmut Holzgraefe
Truncated results happen even earlier when using a character with 3-byte UTF-8 sequence:

select length(data) as len from ( select repeat('☃', ...) as data ) as Sub;

parameter -> result

21844 -> 65532
21845 -> 65535
21846 -> 65535 
...
65535 -> 65535
65536 -> 196608
[20 Jul 2015 14:58] Paul DuBois
Noted in 5.7.9, 5.8.0 changelogs.

For some string functions, data was truncated when evaluated in
subqueries due to incorrect space calculations when creating
temporary tables to hold intermediate results.