Bug #86898 Suboptimal String::append() usage in JSON processing
Submitted: 2 Jul 2017 14:33 Modified: 5 Jul 2017 14:47
Reporter: Alexey Kopytov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: JSON Severity:S3 (Non-critical)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:Any

[2 Jul 2017 14:33] Alexey Kopytov
Description:
double_quote() is on a critical code path in workloads involving massive
JSON retrieval. The only reason it exists is that a relatively small
subset of characters in the 0-255 range must be escaped properly
according to the JSON spec.

I'm trying to optimize a read-only workload where double_quote() is the
top-most function in perf reports. Since the set of characters that need
escaping is relatively small compared to those that can be copied as is,
it is safe to assume that in most JSON documents there are long segments
of characters that don't have to be escaped. In which case, copying them
one-by-one with the String::append(char) function is inefficient. Using
String::append(const char *s, size_t arg_length) would be more
efficient, since it uses memcpy() internally.

I'm going to contribute a patch implementing that idea. Results from a
simple benchmark:

- baseline 5.7.17: 12132 TPS
- 5.7.17 with reserve() calls removed (see bug #86894): 12694 TPS TPS (+4.63%)
- 5.7.17 with String::append() calls optimized: 13055 TPS (+7.61%)

How to repeat:
Read the code in double_quote().

Suggested fix:
Don't call String::append() for every single character in the input
string. In many cases, output string can be generated by appending
larger character sequences.
[3 Jul 2017 14:57] OCA Admin
Contribution submitted via Github - Bug #86898: Suboptimal String::append() usage in JSON processing 
(*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/153#issuecomment-312630209): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_128523160.txt (text/plain), 2.57 KiB.

[4 Jul 2017 8:44] Knut Anders Hatlen
Thanks for the patch, Alexey!

I'll try to get it incorporated into the code.
[4 Jul 2017 18:35] Knut Anders Hatlen
Posted by developer:
 
The contributed patch also partially fixes Bug#25977595 - JSON OUTPUT DOES NOT QUOTE ASCII 31.

select cast(cast(cast('"\\u001f"' as json) as char) as json) now works without raising a parse error in non-debug builds. In debug builds this query now hits an assertion.

I'll add a test case and fix the assertion.
[5 Jul 2017 14:47] Jon Stephens
Documented fix in the MySQL 8.0.3 changelog as follows:

    Creating a representation of a JSON string now optimizes for the
    common case, that the string contains no special characters that
    need to be escaped, scanning for the first special character in
    the string, and copying each sequence of characters which do not
    require escaping in a single memcpy() call, rather than checking
    each character and then copying it one by one, as was done
    previously.
    
    This fix also corrects failure to escape the special character
    1F (Unit Separator).

Closed.
[6 Oct 2017 15:17] Jon Stephens
Also fixed in MySQL 5.7.21, documentation as above.

See also BUG#87722.
[8 Oct 2017 16:22] Knut Anders Hatlen
Just to clarify: Only the part of the patch that fixes bug#87722 has been backported to 5.7. The part that improves the performance has not yet been backported to 5.7.