Bug #48611 NOT DETERMINISTIC function in query disables optimization
Submitted: 7 Nov 2009 3:15 Modified: 8 Nov 2009 14:20
Reporter: Van Stokes Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.4.3 OS:Windows (XP x64)
Assigned to: CPU Architecture:Any

[7 Nov 2009 3:15] Van Stokes
Description:
Using a NON DETERMINISTIC function in a query WHERE clause appears to disable the query optimization. If the function is made DETERMINISTIC then the query runs quickly. We are using the function in a VIEW.

How to repeat:
This is a very SIMPLISTIC example trying to use one table. Our actual problem occurs when using five tables with five views and the "inventory_view" is the top most view. I hope this example is clear, if not, we can provide our database to you.

The following table references itself that is, if "ParentInvKey" is not ZERO then it references a record of "InventoryKey".

CREATE TABLE  `inventory` (
  `InventoryKey` int(10) unsigned NOT NULL DEFAULT '0',
  `ParentInvKey` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`InventoryKey`),
  KEY `Inventory_IDX01` (`ParentInvKey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Inventory table.';

This function returns the parents inventory key from the given InventoryKey.
It's NOT DETERMINISTIC because the parent CAN CHANGE!

CREATE FUNCTION InventoryGetParentInvKey`( pInventoryKey INT UNSIGNED ) RETURNS int(10) unsigned
NOT DETERMINISTIC
MAIN: BEGIN
	DECLARE mInventoryKey INT UNSIGNED DEFAULT 0;
	IF ISNULL( pInventoryKey ) THEN
		RETURN mInventoryKey;
	END IF;
	IF pInventoryKey > 0 THEN
		-- Get the Parent/Containing Inventory Key.
		SELECT ParentInvKeyINTO mInventoryKey FROM Inventory WHERE InventoryKey = pInventoryKey;
	END IF;
	RETURN mInventoryKey;
END MAIN;

This function returns the TOP MOST InventoryKey in the Hierarchy (i.e. where ParentInvKey is ZERO). Again, this function *SHOULD* be NOT DETERMINISTIC because the parent(s) CAN CHANGE!

CREATE FUNCTION `InventoryGetTopInvKey`( pInventoryKey INT UNSIGNED ) RETURNS int(10) unsigned
NOT DETERMINISTIC
MAIN: BEGIN
	DECLARE mInventoryKey INT UNSIGNED DEFAULT 0;
	DECLARE mLastInventoryKey INT UNSIGNED DEFAULT 0;
	IF ISNULL( pInventoryKey ) THEN
		RETURN mLastInventoryKey;
	END IF;
	SET mLastInventoryKey = pInventoryKey;
	SEARCH: LOOP
		-- Get the Parent/Containing Inventory Key.
		SET mInventoryKey = InventoryGetParentInvKey( mLastInventoryKey );
		IF mInventoryKey < 1 THEN
			-- mLastInventoryKey has no parent and therefore
			-- is the top most inventory item in the hierarchy.
			LEAVE SEARCH;
		END IF;
		SET mLastInventoryKey = mInventoryKey;
	END LOOP SEARCH;
	RETURN mLastInventoryKey;
END MAIN;

(This view is not correct because you would not really do it this way but we are trying to display a simplistic example).

CREATE VIEW inventory_view AS SELECT
A.InventoryKey AS InventoryKey,
A.ParentInvKey AS ParentInvKey,
B.InventoryKey AS TopInvKey
FROM Inventory A, Inventory B
WHERE B.InventoryKey = InventoryGetTopInvKey( A.InventoryKey );

Populate the Inventory table (we have 1M+ rows). Make sure you have inventory items two to four levels deep in the hierarchy (four levels is as deep as we go).

Suggested fix:
The problem is this:
If InventoryGetTopInvKey() is DETERMINISTIC, then the query runs extremely fast. If the InventoryGetTopInvKey() is NOT DETERMINISTIC thenthe query takes 60+ seconds to run. We stumbled onto this error by accident because and inventory item appeared NOT to have been relocated from the "INVENTORY_VIEW" point of view but of you looked directly at the table it was correct. The InventoryGetTopInvKey() was defined incorrectly as DETERMINISTIC when it SHOULD have been NOT DETERMINISTIC.

Not sure of the fix. Why it disables the optimization I have not idea. It really shouldn't.
[7 Nov 2009 8:50] Valeriy Kravchuk
Thank you for the problem report. Can you present the query that is not properly optimized? Even better, also with EXPLAIN results for it.
[7 Nov 2009 13:24] Van Stokes
Here is the information you requested. The problem is occurring with the function InventoryGetLocationInvKey(). If the function is defined as DETERMINISTIC (incorrect) then the view returns the data in a few milliseconds. If the function is defined as NOT DETERMINISTIC (correct) then the view takes over 60 seconds to return the same data.

Here is the function InventoryGetLocationInvKey() CREATE statement:

CREATE DEFINER=`root`@`localhost` FUNCTION `InventoryGetLocationInvKey`( pInventoryKey INT UNSIGNED ) RETURNS int(10) unsigned
 {NOT} DETERMINISTIC
MAIN: BEGIN

	DECLARE mInventoryKey INT UNSIGNED DEFAULT 0;
	DECLARE mLastInventoryKey INT UNSIGNED DEFAULT 0;

	IF ISNULL( pInventoryKey ) THEN
		CALL ServerLogEntry( 3, "InventoryGetLocationInvKey", "VALIDATION: pInventoryKey is NULL." );
		RETURN mLastInventoryKey;
	END IF;

	SET mLastInventoryKey = pInventoryKey;

	SEARCH: LOOP

		-- Get the Parent/Containing Inventory Key.
		SET mInventoryKey = InventoryGetParentInvKey( mLastInventoryKey );

		IF mInventoryKey < 1 THEN
			-- mLastInventoryKey is not contained and therefore
			-- is an inventory item that appears in the zone.
			-- That is, it's the HIGHEST most item in the hierarchy.
			LEAVE SEARCH;
		END IF;

		SET mLastInventoryKey = mInventoryKey;

	END LOOP SEARCH;

	RETURN mLastInventoryKey;

END MAIN $$

This is the statement used to CREATE the view (as we define it):

CREATE OR REPLACE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `inventory_view` AS SELECT
Inventory.InventoryKey AS InventoryKey,
Inventory.ContInvKey AS InventoryContKey,
Inventory.ContAs AS InventoryContAs,
Inventory.OwnerType AS InventoryOwnerType,
Inventory.OwnerKey AS InventoryOwnerKey,
OwnerGetName( Inventory.OwnerType, Inventory.OwnerKey ) AS InventoryOwnerName,
Inventory.Units AS InventoryUnits,
InventoryLocation.QuadrantX AS InventoryQuadrantX,
InventoryLocation.QuadrantY AS InventoryQuadrantY,
InventoryLocation.QuadrantZ AS InventoryQuadrantZ,
InventoryLocation.SectorX AS InventorySectorX,
InventoryLocation.SectorY AS InventorySectorY,
InventoryLocation.SectorZ AS InventorySectorZ,
InventoryLocation.PositionX AS InventoryPositionX,
InventoryLocation.PositionY AS InventoryPositionY,
InventoryLocation.PositionZ AS InventoryPositionZ,
InventoryLocation.OrientationW AS InventoryOrientationW,
InventoryLocation.OrientationX AS InventoryOrientationX,
InventoryLocation.OrientationY AS InventoryOrientationY,
InventoryLocation.OrientationZ AS InventoryOrientationZ,
Item_View.*
FROM Inventory, InventoryLocation, Item_View
WHERE InventoryLocation.InventoryKey = InventoryGetLocationInvKey( Inventory.InventoryKey )
AND Item_View.ItemKey = Inventory.ItemKey;

This is how the SQL Server returns the CREATE statement:

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW  `origins`.`inventory_view` AS SELECT
`inventory`.`InventoryKey` AS `InventoryKey`,
`inventory`.`ContInvKey` AS `InventoryContKey`,
`inventory`.`ContAs` AS `InventoryContAs`,
`inventory`.`OwnerType` AS `InventoryOwnerType`,
`inventory`.`OwnerKey` AS `InventoryOwnerKey`,
`OwnerGetName`(`inventory`.`OwnerType`,`inventory`.`OwnerKey`) AS `InventoryOwnerName`,
`inventory`.`Units` AS `InventoryUnits`,
`inventorylocation`.`QuadrantX` AS `InventoryQuadrantX`,
`inventorylocation`.`QuadrantY` AS `InventoryQuadrantY`,
`inventorylocation`.`QuadrantZ` AS `InventoryQuadrantZ`,
`inventorylocation`.`SectorX` AS `InventorySectorX`,
`inventorylocation`.`SectorY` AS `InventorySectorY`,
`inventorylocation`.`SectorZ` AS `InventorySectorZ`,
`inventorylocation`.`PositionX` AS `InventoryPositionX`,
`inventorylocation`.`PositionY` AS `InventoryPositionY`,
`inventorylocation`.`PositionZ` AS `InventoryPositionZ`,
`inventorylocation`.`OrientationW` AS `InventoryOrientationW`,
`inventorylocation`.`OrientationX` AS `InventoryOrientationX`,
`inventorylocation`.`OrientationY` AS `InventoryOrientationY`,
`inventorylocation`.`OrientationZ` AS `InventoryOrientationZ`,
`item_view`.`ItemKey` AS `ItemKey`,
`item_view`.`ItemName` AS `ItemName`,
`item_view`.`ItemDesc` AS `ItemDesc`,
`item_view`.`ItemMass` AS `ItemMass`,
`item_view`.`ItemSize` AS `ItemSize`,
`item_view`.`ItemIsContainer` AS `ItemIsContainer`,
`item_view`.`ItemIsEnabled` AS `ItemIsEnabled`,
`item_view`.`ItemTechLevel` AS `ItemTechLevel`,
`item_view`.`ItemCategorySubKey` AS `ItemCategorySubKey`,
`item_view`.`ItemCategorySubName` AS `ItemCategorySubName`,
`item_view`.`ItemCategorySubDesc` AS `ItemCategorySubDesc`,
`item_view`.`ItemCategoryKey` AS `ItemCategoryKey`,
`item_view`.`ItemCategoryName` AS `ItemCategoryName`,
`item_view`.`ItemCategoryDesc` AS `ItemCategoryDesc`,
`item_view`.`ListMarket` AS `ListMarket`,
`item_view`.`ObjectTypeKey` AS `ObjectTypeKey`,
`item_view`.`ObjectTypeName` AS `ObjectTypeName`,
`item_view`.`ObjectTypeLabel` AS `ObjectTypeLabel`,
`item_view`.`ObjectTypeMUT` AS `ObjectTypeMUT`,
`item_view`.`GUIImage` AS `GUIImage`
FROM ( (`inventory` JOIN `inventorylocation`) JOIN `item_view`)
WHERE ( (`inventorylocation`.`InventoryKey` = `InventoryGetLocationInvKey`(`inventory`.`InventoryKey`) )
AND (`item_view`.`ItemKey` = `inventory`.`ItemKey`) );

This is the EXPLAIN from the view when InventoryGetLocationInvKey() is using DETERMINISTIC :

EXPLAIN SELECT * from inventory_view WHERE inventoryKey = 1016785;

1, 'SIMPLE', 'inventory', 'const', 'Inventory_UIDX01,Inventory_FK02', 'Inventory_UIDX01', '4', 'const', 1, ''
1, 'SIMPLE', 'inventorylocation', 'const', 'PRIMARY,InventoryLocation_FK01', 'PRIMARY', '4', 'const', 1, ''
1, 'SIMPLE', 'item', 'const', 'PRIMARY,Item_FK01,Item_FK02', 'PRIMARY', '4', 'const', 1, ''
1, 'SIMPLE', 'itemcategorysub', 'const', 'PRIMARY,ItemCategorySub_FK01', 'PRIMARY', '4', 'const,const', 1, ''
1, 'SIMPLE', 'itemcategory', 'const', 'PRIMARY', 'PRIMARY', '2', 'const', 1, ''
1, 'SIMPLE', 'objecttype', 'const', 'PRIMARY', 'PRIMARY', '4', 'const', 1, ''

This is the EXPLAIN from the view when InventoryGetLocationInvKey() is using NOT DETERMINISTIC :

1, 'SIMPLE', 'inventory', 'const', 'Inventory_UIDX01,Inventory_FK02', 'Inventory_UIDX01', '4', 'const', 1, ''
1, 'SIMPLE', 'item', 'const', 'PRIMARY,Item_FK01,Item_FK02', 'PRIMARY', '4', 'const', 1, ''
1, 'SIMPLE', 'itemcategorysub', 'const', 'PRIMARY,ItemCategorySub_FK01', 'PRIMARY', '4', 'const,const', 1, ''
1, 'SIMPLE', 'itemcategory', 'const', 'PRIMARY', 'PRIMARY', '2', 'const', 1, ''
1, 'SIMPLE', 'objecttype', 'const', 'PRIMARY', 'PRIMARY', '4', 'const', 1, ''
1, 'SIMPLE', 'inventorylocation', 'ALL', '', '', '', '', 1034853, 'Using where'
[7 Nov 2009 20:44] Valeriy Kravchuk
Please, send the results of

show create table inventorylocation\G

but, in general, if you have index on column c1 of some time and compare this column to non-deterministic function in the WHERE clause, this index can NOT be used (as every function call with the same parameters may return different result, you are NOT comparing to constant, but rather to a "variable" that is different for every row). 

This may negatively affect query execution plan. Exactly what happened in your case. Entire table is scanned, and order of join is changed.
[7 Nov 2009 20:52] Van Stokes
The information you requested:

SHOW CREATE TABLE inventorylocation;

CREATE TABLE `inventorylocation` (
  `InventoryKey` int(10) unsigned NOT NULL,
  `QuadrantX` bigint(20) NOT NULL,
  `QuadrantY` bigint(20) NOT NULL,
  `QuadrantZ` bigint(20) NOT NULL,
  `SectorX` bigint(20) NOT NULL,
  `SectorY` bigint(20) NOT NULL,
  `SectorZ` bigint(20) NOT NULL,
  `PositionX` decimal(10,4) NOT NULL,
  `PositionY` decimal(10,4) NOT NULL,
  `PositionZ` decimal(10,4) NOT NULL,
  `OrientationW` decimal(5,4) NOT NULL DEFAULT '1.0000',
  `OrientationX` decimal(5,4) NOT NULL DEFAULT '0.0000',
  `OrientationY` decimal(5,4) NOT NULL DEFAULT '0.0000',
  `OrientationZ` decimal(5,4) NOT NULL DEFAULT '0.0000',
  `RecLastModified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`InventoryKey`),
  KEY `InventoryLocation_IDX01` (`QuadrantX`,`QuadrantY`,`QuadrantZ`,`SectorX`,`SectorY`,`SectorZ`,`PositionX`,`PositionY`,`PositionZ`),
  KEY `InventoryLocation_FK01` (`InventoryKey`),
  CONSTRAINT `InventoryLocation_FK01` FOREIGN KEY (`InventoryKey`) REFERENCES `inventory` (`InventoryKey`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Only scene items should be in this table.'

I understand what you are inferring in your last comment about the function be NOT DETERMINISTIC. However, since the table is properly indexed it should still happen quicker than 60 seconds.
[7 Nov 2009 20:58] Van Stokes
Perhaps we (our group) has a miss-understanding of DETERMINISTIC vs NOT DETERMINISTIC.

Is DETERMINISTIC for the life of the query or the life of the SQL server? In other words, does "DETERMINISTIC" only matter during the query instance?
What we want to make sure is the server is NOT caching the function result and reusing it for future queries that may be made AFTER the current query has finished.
[7 Nov 2009 22:46] Peter Laursen
I took myself the freedom to steal some considerations from here to report this:
http://bugs.mysql.com/bug.php?id=48615

I think clarity about DETERMINISTIC on one side and your performance problems on the other side should have each their report.

However if MySQL people find it appropriate they can make them duplicates.
[8 Nov 2009 8:05] Valeriy Kravchuk
The server does NOT cache the function result and does NOT reuse it for future queries in any case. Depending on the execution plan for the query, function may be called more that once. DETERMINISTIC is used to tell optimizer that all these calls with the same parameters will produce the same result, thus you can call function once and then optimize query as if we have a constant instead of function call. 

This is a "hint" for optimizer (and binary logging, if configured). There is no good way, in general, to find out efficiently is function really deterministic or not, so server just trust this your explicit declaration.

I think this is NOT a bug. Check http://dev.mysql.com/doc/refman/5.4/en/create-procedure.html.
[8 Nov 2009 14:20] Van Stokes
Thank you very much for the clarification! We agree that this is NOT a bug. This is our miss-understanding of the role of DETERMINISTIC. The following statement is from the documentation and is somewhat ambiguous:

"A procedure or function is considered “deterministic” if it always produces the same result for the same input parameters, and “not deterministic” otherwise. If neither DETERMINISTIC nor NOT DETERMINISTIC is given in the routine definition, the default is NOT DETERMINISTIC."

I suggest that you place your clarification into the documentation. 

Thank you.