Bug #45000 | Make .NET connector faster | ||
---|---|---|---|
Submitted: | 21 May 2009 2:43 | Modified: | 21 Aug 2009 10:52 |
Reporter: | Dennis Haney | Email Updates: | |
Status: | Open | Impact on me: | |
Category: | Connector / NET | Severity: | S4 (Feature request) |
Version: | 6.0.3 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[21 May 2009 2:43]
Dennis Haney
[21 May 2009 2:52]
Dennis Haney
Ok, managed to google my way to figuring out how to use the binary protocol. Who choose to bind this to the prepared statements. I shouldn't have to do another server roundtrip to get the desired speed. But that doesnt really help everyone else, so binary packets should definately be default.
[19 Aug 2009 11:56]
Tonci Grgin
Hi Dennis and thanks for your report. Unfortunately, you are not telling us of your use-case but rather dropping code fragment and making suggestions. This is nothing I can start working on. So a volatile answer from me would be "Implementing Double.Parse makes no sense. If you're after max performance please use Integers". Of course, you can not make use this suggestion, right? Second comment seem to suggest you're using PS and you complain they are "faked" by default, right? Well, they are! Unless IgnorePrepare is set to False (True by default). This remained from times when server itself had problems with prepared statements. Now you see how much guessing I did. Please elaborate your case better and attach test case supporting your claims.
[19 Aug 2009 13:17]
Tonci Grgin
Or is it that you want binary protocol for plain statements? If so, I think it's implemented in 5.4 by Kostja ("execute direct" or something).
[20 Aug 2009 2:15]
Dennis Haney
Hi Tonci Grgin I thought I was rather specific with my use case.... "I am trying to load a huge table into memory (29000 rows and 275 cols), and it takes a couple of seconds." The problem occurs primarily due to the fact that data is sent from MySQL to the client using a TEXT protocol, and I specified some hints to speed this up. I did manage to figure out how to force the binary protocol, and get out of this problem, and it also give me almost double the speed making my use case acceptable. However, for everyone else that do not have the time to figure out how to enable the binary protocol, the existing TEXT code could be optimized with a few simple things as I mentioned. All of the IMySqlValue are implemented with similar patterns, lets take the double for instance. The time using part of this code is this line: double.Parse(packet.ReadString(length) There are three problems here as I also mentioned above. 1. double.Parse is SLOOOOOW. It can parse a HUGE range of numbers, where as the format sent from MySQL matches a very specific subset of those. Thus if you replce double.Parse with a MySqlDoubleParser.Parse and specialize it for the format that is sent from the server. 2. packet.ReadString is also a generalized function. It will read UTF-8 strings, even in the case of a double or an int. This obviously takes time since it needs to check for multibyte encodings for every char. Specializing the function for plain ascii reads will also give a speed boost. 3. packet.ReadString depends on BufferedStream, which is a good thing since most reads are done in single bytes. However, it is using the standard 4k buffer, which means that it will rather often have to refill itself. You should check if changing the buffer size to 64k improves speed or not.
[20 Aug 2009 6:31]
Tonci Grgin
Hi Dennis and thanks for explanation. You leave me little choice but to set this to "Verified" so we can check on your claims and see if things can get any faster... Reggie?
[20 Aug 2009 15:43]
Reggie Burnett
Dennis We are always looking to make the connector faster. I've spent some time looking into your points. Here is what I found. I ran a test where I timed Double.Parse parsing 1 billion double values. It took like 0.02 seconds to parse 1 billion values. I don't think the parsing is causing you any performance headaches. If you have an implementation of Double.parse that shows significant perf gains, I'll be happy to include it. I then ran a test where I created a table with 1000 rows. Each row had 3 double values in it. I created a console app that read this table 1000 times (for a total of reading the rows 1 million times). I timed with the existing packet.ReadString and then I created a version that read the bytes and created a string with ASCIIEncoding.Ascii. There was no significant increase in perf. I did notice one small perf gain when used the tempBuffer in ReadString instead of newing up a new buffer every time. And, finally, increasing the buffer size in BufferedStream isn't always the right thing to do. The buffered stream will try to fill itself and can cause more delay before getting the initial data. And if the user is paging data or just wants the first few rows this will cause a degradation on him. I'm not convinced that this is the right play. We really appreciate your input and always welcome concrete improvements to our product. Keep up the good work and we'll keep profiling and looking for ways to speed things up.
[21 Aug 2009 10:42]
Dennis Haney
I am not quite sure exactly how you tested the speed of double.Parse, but it is way slower than what you write. Stopwatch sw = new Stopwatch(); sw.Start(); for (int i = 0; i < 5000000; i++) x = double.Parse("123.12", NumberFormatInfo.InvariantInfo); sw.Stop(); Console.WriteLine(sw.Elapsed.TotalSeconds); Takes from 2200 to 2400 milliseconds on my machine in release mode. Í implemented a quick and dirty double parser: /// <summary> /// Parses double in the format -?(?:[0-9]*(?:.[0-9]*)?)? /// </summary> public static double SimpleDoubleParser(string org) { char[] s00 = org.ToCharArray(); int s00Length = s00.Length; int index = 0; long intpart = 0, floatpart = 0; if (s00Length == 0) return 0.0; bool sign = false; if (s00[0] == '-') { sign = true; index = 1; } if (index == s00Length) return 0.0; for (; index < s00Length; index++) { if (s00[index] < '0' || s00[index] > '9') break; intpart = intpart*10 + (s00[index] - '0'); } if (index == s00Length) return sign ? -intpart : intpart; //TODO add assert if not a . at this point int decimalpoints = 1; for (index++; index < s00Length; index++, decimalpoints*=10) { floatpart = floatpart*10 + (s00[index] - '0'); } double ret = intpart + (double)floatpart/decimalpoints; return sign ? -ret : ret; } It does the job in 600 milliseconds.
[21 Aug 2009 10:52]
Dennis Haney
and since you in this context are reading the bytes 1 by 1 anyway, you can inline that and avoid the ToCharArray. Then it does the job in 300msec... Almost 9 times faster than the original