Bug #18972 | Dataset to large | ||
---|---|---|---|
Submitted: | 10 Apr 2006 17:37 | Modified: | 10 Apr 2006 21:19 |
Reporter: | Thijs Sanders | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Connector / J | Severity: | S2 (Serious) |
Version: | Mysql connector 3.1.12 | OS: | Linux (Redhat 9) |
Assigned to: | CPU Architecture: | Any |
[10 Apr 2006 17:37]
Thijs Sanders
[10 Apr 2006 18:53]
Mark Matthews
This is covered in the manual, see the "ResultSet" section of: http://dev.mysql.com/doc/refman/5.0/en/cj-implementation-notes.html
[10 Apr 2006 20:11]
Thijs Sanders
Thank you Mark for the reply, streaming is also a really good option (which I surely implement, because it consumes less memory and releaves stress of the GC). But I really would like to know is the reason the size of the Resultset stayed the same after converting the table. The length of the Strings where shorter and I took less consuming primitive types. I might have overlooked it, but could you please answer this??
[10 Apr 2006 20:22]
Mark Matthews
If you're using "plain" java.sql.Statements, MySQL returns the values as strings, which the JDBC driver stores as byte[]s. Therefore the "client storage size" for plain statements is related to the string representation of the value, not the binary representation of the same value. If you're using MySQL-4.1 or newer, when using prepared statements the server will return "native" binary representations of the values, which in some cases will be smaller. For example, the integer value '1', encoded as a string will of course be smaller than 32 bits (8 bytes), while '1234567891234' will be larger than 8 bytes. The JDBC driver will use these "native" binary representations when storing the values, and thus in many cases, the memory required will be less.
[10 Apr 2006 20:31]
Thijs Sanders
Ahhh yes thank you Mark, that was what I'm searching for!! That was really hard to find. . Only 32 bits are 4 bytes and not 8, but thanks again!
[10 Apr 2006 20:41]
Mark Matthews
Sorry for the typo!
[10 Apr 2006 20:42]
Mark Matthews
Realize also that there is overhead for every java object instance, so your calculations are quite simplistic. Arrays have 16 bytes or so of overhead, and each field value in the JDBC driver is an array of byte[]. Storing it as one byte[] with offsets would actually take more memory in most cases, since each int "offset" into the array would take 8 bytes _each_ (in general, most primatives have an 8 byte overhead). Here's your calculations taking that fact into account: Float 4 bytes * 6500000 = 26000000 bytes (+ 52000000 bytes JVM overhead) unsigned mediumint = long 8 bytes * 6500000 = 52000000 bytes (+ 52000000 JVM overhead) Array for each row: 104000000 overhead 286,000,000 (272M) bytes for the data alone (not including other intermediate objects created by the java class libraries themselves when doing I/O and numeric conversions, as well as internal object instances the driver creates).
[10 Apr 2006 21:19]
Thijs Sanders
ahhh yes that would explain a lot. That is one big overhead, I guess streaming in this case is the only good option because it takes to much memory and resources to save all the rows and fields in byte arrays and it stresses the GC and needs to malloc() too many times