Bug #113138 Optimization of MySQL query with LIMIT
Submitted: 20 Nov 2023 7:43 Modified: 20 Nov 2023 13:15
Reporter: Ivan Nesic Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.2.0 OS:Any
Assigned to: CPU Architecture:Any

[20 Nov 2023 7:43] Ivan Nesic
Description:
I have two tables:

CREATE TABLE `invoice` (
    `invoice_id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `offer_id` BIGINT(20) UNSIGNED NOT NULL,
    `currency_id` INT(11) NOT NULL DEFAULT '0',
    `insert_date_invoice` DATE NOT NULL,    
    `propagation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
    PRIMARY KEY (`invoice_id`) USING BTREE,
    UNIQUE INDEX `propagation_id` (`propagation_id`) USING BTREE,
    INDEX `FK_invoice_offer` (`offer_id`) USING BTREE,
    INDEX `insert_date_invoice` (`insert_date_invoice`) USING BTREE,
    INDEX `FK_invoice_currency` (`currency_id`) USING BTREE,
    CONSTRAINT `FK_invoice_currency` FOREIGN KEY (`currency_id`) REFERENCES `db_frost`.`currency` (`currency_id`) ON UPDATE RESTRICT ON DELETE RESTRICT,
    CONSTRAINT `FK_invoice_offer` FOREIGN KEY (`offer_id`) REFERENCES `db_frost`.`offer` (`offer_id`) ON UPDATE RESTRICT ON DELETE RESTRICT
)
COLLATE='utf8mb3_unicode_ci'
ENGINE=InnoDB
AUTO_INCREMENT=1
;

CREATE TABLE `currency` (
    `currency_id` INT(11) NOT NULL AUTO_INCREMENT,
    `currency` VARCHAR(100) NOT NULL COLLATE 'utf8mb3_unicode_ci',
    PRIMARY KEY (`currency_id`) USING BTREE
)
COLLATE='utf8mb3_unicode_ci'
ENGINE=InnoDB
AUTO_INCREMENT=1
;

`invoice` has around 70K records and `currency` table has 2 records.

Running this query:

SELECT invoice.invoice_id, currency.currency_id
FROM invoice
INNER JOIN currency ON (currency.currency_id = invoice.currency_id) 
ORDER BY invoice.invoice_id DESC
LIMIT 30

EXPLAIN ANALYZE returns this result:

-> Limit: 30 row(s)  (cost=16867 rows=30) (actual time=0.0448..0.0555 rows=30 loops=1)
    -> Nested loop inner join  (cost=16867 rows=30) (actual time=0.0439..0.0532 rows=30 loops=1)
        -> Index scan on invoice using PRIMARY  (cost=0.14 rows=30) (actual time=0.0346..0.0377 rows=30 loops=1)
        -> Single-row covering index lookup on currency using PRIMARY (currency_id=invoice.currency_id)  (cost=0.25 rows=1) (actual time=355e-6..375e-6 rows=1 loops=30)

AND EXPLAIN FORMAT=JSON returns this result:
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "30667.05"
    },
    "ordering_operation": {
      "using_filesort": false,
      "nested_loop": [
        {
          "table": {
            "table_name": "invoice",
            "access_type": "index",
            "possible_keys": [
              "FK_invoice_currency"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "invoice_id"
            ],
            "key_length": "8",
            "rows_examined_per_scan": 30,
            "rows_produced_per_join": 67454,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "312.75",
              "eval_cost": "6745.40",
              "prefix_cost": "7058.15",
              "data_read_per_join": "477M"
            },
            "used_columns": [
              "invoice_id",
              "currency_id"
            ]
          }
        },
        {
          "table": {
            "table_name": "currency",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "currency_id"
            ],
            "key_length": "4",
            "ref": [
              "db_frost.invoice.currency_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 67454,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "16863.50",
              "eval_cost": "6745.40",
              "prefix_cost": "30667.05",
              "data_read_per_join": "26M"
            },
            "used_columns": [
              "currency_id"
            ]
          }
        }
      ]
    }
  }
}

If I add one field from `currency` table (and I get this query):
SELECT invoice.invoice_id, currency.currency_id, **currency.currency**
FROM invoice
INNER JOIN currency ON (currency.currency_id = invoice.currency_id) 
ORDER BY invoice.invoice_id DESC
LIMIT 30

I get degraded performance.

EXPLAIN ANALYZE return this result:
-> Limit: 30 row(s)  (actual time=43.8..43.8 rows=30 loops=1)
    -> Sort: invoice.invoice_id, limit input to 30 row(s) per chunk  (actual time=43.8..43.8 rows=30 loops=1)
        -> Stream results  (cost=27295 rows=67454) (actual time=19.6..38.2 rows=69067 loops=1)

EXPLAIN FORMAT=JSON returns result:
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "27295.37"
    },
    "ordering_operation": {
      "using_temporary_table": true,
      "using_filesort": true,
      "nested_loop": [
        {
          "table": {
            "table_name": "invoice",
            "access_type": "index",
            "possible_keys": [
              "FK_invoice_currency"
            ],
            "key": "FK_invoice_currency",
            "used_key_parts": [
              "currency_id"
            ],
            "key_length": "4",
            "rows_examined_per_scan": 67454,
            "rows_produced_per_join": 67454,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "312.75",
              "eval_cost": "6745.40",
              "prefix_cost": "7058.15",
              "data_read_per_join": "477M"
            },
            "used_columns": [
              "invoice_id",
              "currency_id"
            ]
          }
        },
        {
          "table": {
            "table_name": "currency",
            "access_type": "ALL",
            "possible_keys": [
              "PRIMARY"
            ],
            "rows_examined_per_scan": 3,
            "rows_produced_per_join": 67454,
            "filtered": "33.33",
            "using_join_buffer": "hash join",
            "cost_info": {
              "read_cost": "1.02",
              "eval_cost": "6745.40",
              "prefix_cost": "27295.37",
              "data_read_per_join": "26M"
            },
            "used_columns": [
              "currency_id",
              "currency"
            ],
            "attached_condition": "(`db_frost`.`currency`.`currency_id` = `db_frost`.`invoice`.`currency_id`)"
          }
        }
      ]
    }
  }
}

How to repeat:
I explained in description what is setup and what queries I run and what results/issues were. Following instructions from 'Description' should repeat a bug.
[20 Nov 2023 13:15] MySQL Verification Team
Hi Mr. Nesic,

Thank you very much for your bug report.

However, this is not a bug, but the expected behaviour.

Your query can be resolved only by the usage of the temporary table.

When you add any type of the columns from the character domain, your temporary table size can be larger. In your case, it is significantly larger, since you can have the additional 200 to 400 bytes for each row that has to be examined. Particularly if you use some esoteric alphabet, like the ones from the Balkan countries.

Hence, the performance will be definitely impacted. It is expected behaviour.

Not a bug.