Bug #68826 | Terrible bad insert performance | ||
---|---|---|---|
Submitted: | 1 Apr 2013 18:29 | Modified: | 26 Oct 2013 14:50 |
Reporter: | Louis at oracle | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | Connector / NET | Severity: | S3 (Non-critical) |
Version: | 6.x | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | insert row, performance |
[1 Apr 2013 18:29]
Louis at oracle
[25 Jun 2013 22:17]
Francisco Alberto Tirado Zavala
Hello Louis. Please, can you share more details about the flow that you are using? I have a table with more than 1 million records and I'm using ODBC driver to get the data, my doubts are: 1 - I just insert that data into other table, or what I need to do? 2 - How are you inserting the data? Are you using OdbcCommand or what? 3 - Are you inserting new records in the same table using DataAdapter.Update()? 4 - How many records you send by time, 1000, 5000, 10000? If you can share me the functions with the code that you are using will be great. Thanks for your time.
[26 Jun 2013 7:34]
Louis at oracle
Hello, See my reactions below Please, can you share more details about the flow that you are using? > I think I described the essence of the code, I can not share the real code with you I have a table with more than 1 million records and I'm using ODBC driver to get the data, my doubts are: 1 - I just insert that data into other table, or what I need to do? > I was using completely ODBC-based code in the passed > I changed that into .dot net code (which is far less comfortable and universal than ODBC) hoping for better performance; the opposite was true 2 - How are you inserting the data? Are you using OdbcCommand or what? > no! At least not when I wrote the ticket. As written before datatabel.rows.add(datarecord) folleowed by dataadapter.update(datatable) 3 - Are you inserting new records in the same table using DataAdapter.Update()? > there are two table relevant in this example; the source table and the destination table where the records are inserted (or updated if there is a collision) 4 - How many records you send by time, 1000, 5000, 10000? > it is a continues loop reading data from the source table anlysing/ processing that data and inserting the result as one or many records into the destination table > so I push the button and the reading and inserting proces starts as fast as the VB.net application + MySQL (local) database can go up the moment all source records are processed. > the total number of inserted records is relatively can be >> 10 mln > as a result of one read there can be 1 .... 10.000 inserts or updates in the destination table If you can share me the functions with the code that you are using will be great. > no, I can not. Also note that I have been changing the routines all the time trying to find out where the bottle necks are. Adding lots of debug code > I got reasonable performance in the end based on a combination of Technics mainly ODBC Thanks for your time. > It is relatively long ago I was working on this. > Sincerely, louis
[26 Jun 2013 21:47]
Francisco Alberto Tirado Zavala
Hello Louis. The performance is regarding how DataAdapter works, it process one row at a time, so the time will be relative to the amount of records to insert or update. Visit this link for reference: http://msdn.microsoft.com/en-us/library/system.data.common.dataadapter.update.aspx However DataAdapter has a property called UpdateBatchSize to update/insert the records in a batch rather one by one, but this option just send a bunch of instructions rather than all the records in a single instruction, and the performance would be better or not depending on the amount of records sent in a batch. Visit this link for reference: http://msdn.microsoft.com/en-us/library/kbbwt18a(v=vs.80).aspx Currently MySqlDataAdapter works very similar to .Net Framework DataAdapter, so you will face the same performance situation. But you can change the way to insert the data into the other table, what I suggest is to create your own insert instruction rather than use the CommandBuilder object, and construct it in the way that you send multiple insert in a single instruction, as MySql support that you will notice an important increase in performance. You need to create the query with the following format: INSERT INTO example (id, value, other_value) VALUES (100, 'Value 1', 'Other 1'), (101, 'Value 2', 'Other 2'), (102, 'Value 3', 'Other 3'), [HERE YOU PUT AS MANY VALUES AS YOU WANT] ; <--- Don't forget the semicolon at the end. And then execute the script generated with a MySqlCommand.ExecuteNonQuery() Also verify the max_allowed_packet variable in your server to know the amount of data that you can send to the server to not received the error: "MySQL Error 1153 - Got a packet bigger than 'max_allowed_packet' bytes" Visit this link for reference: http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_max_allowed_pac... Here you have a function to generate the command for your custom insert, I hope this help: Private Function _GenerateInsertBatch(ByVal Data As DataTable, ByVal BatchSize As Integer, ByVal TableName As String, ParamArray ColumnNames() As String) As List(Of String) Dim strb As StringBuilder = New StringBuilder() Dim row As DataRow Dim result As List(Of String) = New List(Of String) Dim cols = ColumnNames.Aggregate(Function(current, [next]) Convert.ToString(current) & ", " & Convert.ToString([next])) Dim values As String = String.Empty strb.Append(String.Format("INSERT INTO {0} ({1}) VALUES", TableName, cols)) Dim ctr As Integer For ctr = 0 To Data.Rows.Count - 1 Step 1 row = Data.Rows(ctr) ColumnNames.ToList().ForEach(Sub(col) values += "'" & row(col) & "',") values = values.Substring(0, values.LastIndexOf(",")) strb.Append(String.Format("({0})", values)) values = String.Empty If (ctr > 0 AndAlso CType(ctr, Double) Mod CType(BatchSize, Double) = 0) Or (ctr = Data.Rows.Count - 1) Then strb.Append(";") result.Add(strb.ToString()) strb.Clear() strb.Append(String.Format("INSERT INTO {0} ({1}) VALUES", TableName, cols)) Else strb.Append(",") End If Next Return result End Function The function will return a List of String, each item in the list has the code for a single instruction with multiples values to insert, the BatchSize parameter specify the number of insert on each instruction, I tested the function inserting 100,000 records (1000 records per batch) the process finish in less than 10 seconds, I think is acceptable for the records amount. This is an example of how to use the function: ... 'here is the code to create the MySqlConnection object, fill the DataTable, etc Dim cmds As List(Of String) = _GenerateInsertBatch(_datatable, 1000, "TestTable", "ColumnID", "ColumnName", "ColumnDescription") Dim mysqlCmd As MySqlCommand = New MySqlCommand() mysqlCmd.Connection = _connection _connection.Open() For Each cmd As String In cmds mysqlCmd.CommandText = cmd mysqlCmd.ExecuteNonQuery() Next _connection.Close() 'more code to finish the process ... Thanks for your time. Happy coding.
[28 Jun 2013 16:20]
Francisco Alberto Tirado Zavala
Hello Louis. Is your problem solved or you have any additional comments? Saludos
[28 Jun 2013 19:51]
Louis at oracle
Francisco, As said before the way ODBC is designed is IMHO far more logical than the way the .net adapter is designed. However since the .net adapter is closer to the design of ".net" I did expect the .net adapter to provide significant better performance. Up to now my experience is the opposite. Perhaps my expectations where too high To react to your mail: - it is clear that processing time is linear to the number of records processed - update batch wise is more performed, however I did choose not to do so because, you have a lot of error handling in case on ore more inserts did cause key violations (which can happen, the database keys are designed for that purpose - I am aware of the possibility to insert records in groups, but as above that would lead to very complex error handling - Related tot the executenonquery with the "home build sql string" actually I am using the code . Try MyStopWatch.Start() cn.Execute(sqlInsertRow, NoInserted) MyStopWatch.Stop() right now (cn = ODBC connection) - did not yet try "MySqlCommand.ExecuteNonQuery()" but I think it is more ore less equivalent I think there are a couple of ways to improve performance: 1) better performance related to connector but also vb.net it self @ that is what I hoped 2) batch processing but that is a crime due to the complex error handling @ that is what I regarded as the way not to go 3) multi threading, perhaps the best option, however multi threaded inserting into and updating in the same table, same indexes does not sound good as well (and the code is more complex since you have to divide the task over multiple threads) @ I have been thinking about this option, because both processor and disk resources are hardly used!!!! Despite infinitive loop without natural wait points. The reason is quite simple: processor-1 is executing vb.net, than it hands the data to processor-2 executing mysql so effectively there is only one out of 8 processors doing something, back to vb.net etc I did not do it yet because the programming overhead and also because what I gain there will partly be lost because thread one is effecting the same indexes as thread two etc, probably causing lots of index reorganization activities innodb internally. Louis
[28 Jun 2013 21:26]
Francisco Alberto Tirado Zavala
Hello Louis. If your concern is the Key violations you can use IGNORE or ON DUPLICATE KEY UPDATE options for the INSERT command, just adding that to the generated INSERT command for the batch instruction you can handle the Key validations. Please refer to the next link to get more information: http://dev.mysql.com/doc/refman/5.5/en/insert.html Saludos
[2 Jul 2013 18:41]
Louis at oracle
Hello, Thank you very much for the ^ON DUPLICATE KEY UPDATE^ tip. I was not aware of that sql option. That way of errorhandling is no doubt a lot faster, but with as disadvantage that the erro rhandling possibilities are very limited. I also wonder if you can use ^ON DUPLICATE KEY UPDATE^ in a batch insert query I also had a look at the "CreateSetofBatchInsertStatements" code you provided. Clever, but very nasty to read. The code is not correct as well because it cannot handle Null and numerical values correctly. In fact it assumes everything is a string not being Null So general good tips!! If ^ON DUPLICATE KEY UPDATE^ can be used to handle the key violations with enough accuracy (have to think about that, do the error handling different at least) and it can be combined with a batch insert, inserting will be a lot faster. However, the basis of all is of course a fast .net adapter, even apart from the fact that you can do other things to speed up things even more Louis
[2 Jul 2013 22:41]
Francisco Alberto Tirado Zavala
Hello Louis. Yes, you can use "ON DUPLICATE KEY UPDATE" in a batch insert, please refer to the link: http://dev.mysql.com/doc/refman/5.5/en/insert.html for extra information about it. Thanks for your comments, we'll take it in mind for future features. Saludos.
[4 Jul 2013 16:07]
Francisco Alberto Tirado Zavala
Hello Louis. I did some extra tests looking for another approach to not use batch inserts, and maybe what I found could be useful for you and doesn't need a big change in your code. You can use a transaction combined with data adapter, using that way you don't need to change anything in your logic to handle errors and perform validations against the data. You just need to start the transaction before update the data with the data adapter and commit the transaction after the update finishes, or do a rollback in case of error. For more information please visit this link: http://dev.mysql.com/doc/refman/5.0/es/connector-net-examples-mysqltransaction.html Saludos
[27 Oct 2013 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".