TokuDB not using index for specific queries on percona server
example query and explain output
show index from
show create table
After change engine to innodb query works as expected
example query and explain output with tokudb
I test this mostly on percona server (different versions) but similar issue occurred also on mysql and mariadb.
Hello George! Thanks for the quick response and detailed explanations.
TokuDB was our choice mainly because of the high level of compression (CPU/IO rate) and due to fact that we use analytics application (piwik) and collect large amounts of data.
Despite of the small inconvenience we are still happy and continue to use TokuDB.
Hello Jacek, I knew your name was familiar and I now remember working with you on https://tokutek.atlassian.net/browse/DB-988 and this is somewhat related to that issue.
ICP is rather poorly implemented in MySQL and forces the storage engines to do tricks for it to work perfectly in all but the simplest cases, which was the cause of the previous issue due to the descending scan.
TokuDB ICP is simply not as sophisticated as InnodB in using these tricks to remember key parts and downward scan limits and this could be precisely one of those cases, why this is important will become clearer below.
In debugging this, it seems that the server is basing the 'explain' result on what the storage engine returns from idx_cond_push, which is the function that server uses to give the storage engine the ICP condition. When the server sends the condition to the storage engine, the storage engine is expected to return the portion of the condition that it might not handle bach to the server so the server can perform that condition on the result set returned from the storage engine.
InnoDB returns NULL which tells the server that InnoDB has agreed to handle the entire condition precisely and that the server need not serve as a 'backup' to evaluate the returned results and ensure that they are correctly within the range(s) specified.
TokuDB on the other hand, returns the full condition set back to the server, telling it that it has accepted the hint and will use the condition hint where it can, but provides no guarantee that all conditions and rows have been prescisely filtered in the result set, forcing the server to validate the results by applying the condition to the rows in the result set.
So TokuDB will use the condition to filter in this case but since it is not guaranteeing to the server that it will handle the condition 100%, the server must report in explain that is is using the where condition even though TokuDB is in fact doing some amount of ICP filtering on its own.
I have confirmed in this specific use case, using your table structure and with a few hundred records that align roughly with your key uniqueness (from your ANALYZE) that TokuDB is in fact using the ICP condition to return the filtered rowset, which will then also be re-filtered by the server to ensure the results are precisely within the specified condition.
I doubt this behavior will be changed much in the near future.
This is a curious difference though, it looks like it should be invoking the TokuDB ICP implementation here. It is choosing the correct key, but using a different condition. This should be a simple forward range scan which should use the TokuDB ICP functionality and not fall back to the servers 'where' condition, giving a little bit of a performance boost on the query by allowing the engine to execute the conditional in a bulk fetching mode rather then the server scanning the index. I'll have to fire this up in the debugger and see if I can figure out why it is deciding to use 'where' in this case instead of ICP.
I see some of the difference, in InnoDB it is using index condition pushdown optimization (ICP) where TokuDB at various points did not fully support ICP in the same ways as InnoDB. TokuDB supports ICP for certain forward scanning queries only and has some fixes that exist only in Percona Server 5.6.30-76.3 and greater (5.6.32-78.0 is even better).
TokuDB is not a 'perfect' drop in replacement for InnoDB and may exhibit different performance behaviors in different cases. The general question I like to ask is why do you choose TouDB and do you understand your workload and what you trade off when you switch to it. If you can not answer that question clearly, then TokuDB may not be providing you with the expected benefit and in fact may not be performing as well as InnoDB might for the same workload.
Results after ANALYZE TABLE
example query and explain output with tokudb
show index from with tokudb
example query and explain output with innodb
show index from with innodb