| Bug #68476 | Suboptimal code in my_strnxfrm_simple() | ||
|---|---|---|---|
| Submitted: | 24 Feb 2013 5:41 | Modified: | 3 Dec 2014 15:44 |
| Reporter: | Alexey Kopytov | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.1, 5.5, 5.6 | OS: | Any |
| Assigned to: | Tor Didriksen | CPU Architecture: | Any |
[26 Feb 2013 16:30]
MySQL Verification Team
Thank you for the bug report.
[15 Apr 2013 8:22]
Tor Didriksen
It depends on the input data. For long strings (100 bytes, like sysbench?) the special handling of dst == src is actually much faster than the other branch. For short strings the extra if/else overhead becomes noticeable.
[15 Apr 2013 8:55]
Alexey Kopytov
Tor,
Let's rewrite that code to an equivalent form for clarity:
if (dest != src)
{
const uchar *end;
for ( end=src+len; src < end ; dest++, src++ )
*dest= map[*src];
}
else
{
const uchar *end;
for ( end=src+len; src < end ; src++)
*src= map[*src];
dest= src;
}
With this in mind, is there really any difference between the branches?
[15 Apr 2013 9:05]
Tor Didriksen
I agree that the code can be re-written, I'm just pointing out that this is much faster for long strings:
const uchar *end;
for (end= dst + frmlen; dst < end; ++dst)
*dst= map[(uchar) *dst];
(and I'm assuming that this is actually the code you are executing running sysbench)
For short strings, something like this is better on my machine
uint ix;
for (ix= 0; ix < frmlen; ++ix)
dst[ix] = map[src[ix]];
[25 Nov 2014 10:21]
Laurynas Biveinis
Bug 68476 patch for 5.7 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug68476.patch (application/octet-stream, text), 1.14 KiB.
[25 Nov 2014 10:22]
Laurynas Biveinis
Patch author Alexey Kopytov
[27 Nov 2014 17:09]
Tor Didriksen
Posted by developer: Benchmark results (see unit test) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcDst/0 (147 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcDst/1 (239 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcDst/2 (1173 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcDst/3 (9310 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcDst/0 (150 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcDst/1 (231 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcDst/2 (753 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcDst/3 (5908 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcDst/0 (145 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcDst/1 (233 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcDst/2 (1165 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcDst/3 (9420 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcDst/0 (151 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcDst/1 (220 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcDst/2 (767 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcDst/3 (5948 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcSrc/0 (146 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcSrc/1 (225 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcSrc/2 (992 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalSrcSrc/3 (7384 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcSrc/0 (146 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcSrc/1 (221 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcSrc/2 (732 ms) [ OK ] Strnxfrm/StrnxfrmTest.OriginalUnrolledSrcSrc/3 (6043 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcSrc/0 (145 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcSrc/1 (247 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcSrc/2 (1170 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedSrcSrc/3 (9286 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcSrc/0 (151 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcSrc/1 (235 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcSrc/2 (757 ms) [ OK ] Strnxfrm/StrnxfrmTest.ModifiedUnrolledSrcSrc/3 (5900 ms)
[1 Dec 2014 10:06]
Tor Didriksen
Bug#16403708 SUBOPTIMAL CODE IN MY_STRNXFRM_SIMPLE()
Based on a contribution from Alexey Kopytov
Add benchmarking tests showing that my_strnxfrm_simple is not suboptimal
compared to the proposed patch.
However: if we take the proposed patch, and do some loop unrolling,
we get improved results for both (src == dst) and (src != dst) transformations.
[3 Dec 2014 15:44]
Paul DuBois
Noted in 5.7.6 changelog. The code in my_strnxfrm_simple() was suboptimal and was improved. Thanks to Alexey Kopytov for the patch.

Description: This ancient code in my_strnxfrm_simple() is suboptimal: if (dest != src) { const uchar *end; for ( end=src+len; src < end ; ) *dest++= map[*src++]; } else { const uchar *end; for ( end=dest+len; dest < end ; dest++) *dest= (char) map[(uchar) *dest]; } I can't see any difference between these branches. It's probably remnants from _bin collations, where we don't have to do table translation and thus the "dest == src" case is degenerate, e.g.: static size_t my_strnxfrm_bin(CHARSET_INFO *cs __attribute__((unused)), uchar *dest, size_t dstlen, const uchar *src, size_t srclen) { if (dest != src) memcpy(dest, src, min(dstlen,srclen)); if (dstlen > srclen) bfill(dest + srclen, dstlen - srclen, 0); return dstlen; } But the "if()" statement in my_strnxfrm_simple() is redundant. It is executed for each row processed by filesort and shows high branch misprediction numbers in sysbench RO runs. How to repeat: Look at my_strnxfrm_simple().