estimated # of rows in a table could become inaccurate after deletes

Description

TokuDB uses TokuFT to provide an estimate of the number of rows in a fractal tree. TokuFT’s row estimate can be wildly wrong after a lot of deletes have occurred on a table. For example, we have seen a customer delete MOST of the rows from a table, and the row estimate was still 69M rows.

Since MySQL computes cardinality = #rows / #unique rows per key, if #rows is wrong, then cardinality is wrong.

Environment

None

Activity

Show:
George Lorch
July 13, 2015, 11:39 PM

Moved to FT project since this is really an underlying ft-index issue and not a TokuDB issue.

Reinis Rozītis
July 28, 2015, 12:13 PM

Any update/ETA on this?

We are getting to a point where a heavily used table (lots of inserts/deletes) gets to a point where a SELECT on a PRIMARY_KEY > 2342345 or ORDER by PK get's way slow even the table has like 10 or even less rows - I imagine that MySQL starts to use different query execution plans based on the (wrong) rowcount or the engine goes haywire.

Just for example:

As stated ANALYZE/OPTIMIZE doesn't help so for online "repair" we alter the table engine.

Enverex
September 16, 2015, 1:30 PM

Any movement on this? It pretty much rules out the use of TokuDB on insert/delete heavy environments due to queries acting as if the tables are as large as it (wrongly) thinks they are (as Reinis mentioned above). A query on a 10 row table will take as long as a query on a 100 million row table if the table has previously had 100 million rows deleted.

George Lorch
September 16, 2015, 3:48 PM

The row count accuracy issue will likely be fixed in the next release of Percona Server. The row counts are currently inaccurate because they are reporting the number of physical leaf entries (rows) in the tree and not the logical count that you would expect from a "SELECT COUNT".

This will not address the situation you describe of having 100 million rows worth of 'garbage' sitting within a fractal tree with only 10 real rows. An optimize table would be needed to 'flatten' the tree and remove the deleted rows in order to regain query performance. This issue typically happens on timeseries type data in an "insert to the right, delete from the left" use case. Joe Laflamme describe the issue and remedy here : https://www.percona.com/blog/2015/02/11/tokudb-table-optimization-improvements

The primary issue with reporting the wrong type of row count is in a situation with a partitioned table. The inaccuracy can cause the partition engine to choose the wrong partition to obtain index cardinality statistics from and use a sub-optimal query plan.

If you need additional help in tuning your application/server I would suggest that you contact Percona Support : https://www.percona.com/services/support/mysql-support

Assignee

George Lorch

Reporter

Abdelhak Errami

Labels

None

External issue ID

None

Freshdesk Tickets

None

Fix versions

Priority

Major
Configure