Covered Indexes

Covered Indexes are like the greatest thing in the world.
This is the chief reason #1 that SQL Server is better than Access, mySQL, Excel, etc

http://use-the-index-luke.com/sql/clustering/index-only-scan-covering-index
Index-Only Scan: Avoiding Table Access

——————————————————————————–

The index-only scan belongs to the most powerful tuning methods at all. It utilizes not only the clustering of rows with similar values, but also exploits the fact that the index has a copy of the indexed data. That can avoid the table access that is usually following an index scan.

To cover an entire query, an index must contain all columns from the SQL statement. That means, the index must not only cover the where clause, but also the columns from other parts of the SQL statement—that means, also the select clause:

CREATE INDEX sales_sub_eur
ON sales
( subsidiary_id, eur_value );

SELECT SUM(eur_value)
FROM sales
WHERE subsidiary_id = ?;
Indexing the where clause takes precedence over the other clauses, of course. Thus, SUBSIDIARY_ID at first position to qualify as an access predicate.

The execution plan shows the index scan but no table access (TABLE ACCESS BY INDEX ROWID).

———————————————————-
| Id | Operation | Name | Rows | Cost |
———————————————————-
| 0 | SELECT STATEMENT | | 1 | 104 |
| 1 | SORT AGGREGATE | | 1 | |
|* 2 | INDEX RANGE SCAN| SALES_SUB_EUR | 40388 | 104 |
———————————————————-

Predicate Information (identified by operation id):
—————————————————
2 – access(“SUBSIDIARY_ID”=TO_NUMBER(:A))
The index covers the entire query. In this case, the index is also called a “Covering Index”.

Note

An index that prevents a table access is also called “Covering Index”.

The term is misleading, however, because it sounds like an index property—although it describes the way the index is used.

If the index would not cover the EUR_VALUE column, the database would need to fetch all selected rows from the table to calculate the total. But the table access is not necessary because the index has a copy of the required column.

An index-only scan can improve performance dramatically. Just look at the Rows estimate in the execution plan: the optimizer expects to sum about forty thousand rows. That means that the index-only scan prevents forty thousand table fetches—if each row is in a different table block. If the index has a good clustering factor—that is, if the respective rows are well clustered to a few table blocks—the advantage may be significantly lower.

The number of accessed rows is another factor that limits the performance gain of an index-only scan. Selecting a single row, for example, saves one table access only. Considering that the index tree traversal needs to fetch a few blocks as well, the saved table access might become negligible.

Important

The performance advantage of an index-only scans depends on the number of accessed rows and the index clustering factor.

The index-only scan is an aggressive indexing strategy. Do not design an index for an index-only scan on suspicion only. It needs more space and increases the index maintenance effort. See Chapter 9, Insert, Delete and Update (not yet published). In practice, you should first index without considering the select clause and extend the index later, if really needed.

Index-only scans can also cause unpleasant performance behaviour. If you extend the query, for example:

SELECT SUM(eur_value)
FROM sales
WHERE subsidiary_id = ?
AND sale_date > ?;
The where clause refers to a column that is not in the index. The database can therefore not use the index with an index-only scan anymore:

————————————————————–
|Id | Operation | Name | Rows |Cost |
————————————————————–
| 0 | SELECT STATEMENT | | 1 | 371 |
| 1 | SORT AGGREGATE | | 1 | |
|*2 | TABLE ACCESS BY INDEX ROWID| SALES | 2019 | 371 |
|*3 | INDEX RANGE SCAN | SALES_DATE| 10541 | 30 |
————————————————————–

Predicate Information (identified by operation id):
—————————————————
2 – filter(“SUBSIDIARY_ID”=TO_NUMBER(:A))
3 – access(“SALE_DATE”>:B)
Adding columns to an SQL statement can disable an index-only scan, so that the execution becomes much slower. That might seem illogical, if the user expects improved performance when fetching fewer rows.

Warning

Accessing more columns may lead to unexpected performance behaviour. Check the execution plan before extending queries.

Anything can happen if new columns prevent an index-only scan. The index previously used for an index-only scan becomes less attractive, so that the optimizer might take a different index for the new query. It actually happened above. The extended query uses an index on SALE_DATE, which is a leftover from the join chapter.

This index has two advantages for the extended query: First, it matches less rows. The optimizer estimates that about ten thousand rows match the SALE_DATE filter. The filter on SUBSIDIARY_ID matches forty thousand rows, on the other hand. The estimates are visible in the Rows column of the execution plan, but they are purely arbitrary because of the bind parameters. The SALE_DATE condition could, for example, select the entire table, when filtering for a very early date.

The second advantage of the SALES_DATE index is its clustering factor. This is a valid reason because the SALES table grows chronologically. New rows are appended to the end of the table, if there are no delete statements. The same is true for the index, if the SALE_DATE column has the current date. In other words, the table and index grow at the end. The table order will therefore correspond to the index order. The index has a good clustering factor.

A better index clustering factor means that the subsequent table access is less expensive. The new filter on SALE_DATE reduces the number of accessed rows additionally. That means that the query extension affects both factors that limit the performance gain of an index-only. It reduces the number of rows and opens a new access path that benefits from a better clustering factor. The extended query might therefore execute fast enough, even if it needs to perform the table access. Consider removing unnecessary columns or dropping the entire index in that case.

Note

Some indexes have a good clustering factor automatically. An index-only scan gains little for them.

This particular example is a lucky coincidence. The new filter does not only prevent an index-only scan, but opens a new access path at the same time. The optimizer takes the new opportunity to keep the performance impact low. The impact would be huge, on the other hand, if there wouldn’t be an index on SALE_DATE.

The same is true, if new select clause columns prevent an index-only scan. The optimizer will fall back to the next best execution plan. But there is a catch: Adding columns to the select clause cannot open a new access path.

Consider the following example. You have a query that is well indexed but still too slow. There are only few columns in the select clause, so you go for an index-only scan and add the columns form the select clause to the index. Problem solved. A year later, somebody adds another column to the select clause but doesn’t update the index. Guess what is the next best execution plan? It is the plan that was too slow before introducing the index-only scan.

Tip

Maintain your index-only scans.

Add comments that remind you about the index-only scan. Refer to this page, so everybody can read about it.

Function based indexes are another trap you might fall into when aiming for an index-only scan. An index-only scan will not work if you select the LAST_NAME column, but index the UPPER(last_name) expression. That is why the index in the previous section should have used the LAST_NAME column itself, so that it can be used to cover the select clause with an index-only scan.

Tip

Always aim to index the original data. That is often the most useful information you can put into an index.

Avoid function-based indexing if queries cannot use the expression as access predicate anyway.

Aggregate queries, like above, make good candidates for index-only scans. They query many rows, but only a few columns. That means that even a small index can be used for an index-only scan. The more columns you query, the more columns must be indexed for an index-only scan. As developer, you should therefore query only the columns you need.

Tip

Avoid select * and fetch only the columns you need.

Indexing many columns needs more space, but pushes you also towards database limits. Most databases impose rather rigid limits on the key length and the number of columns per index. That means that you cannot index an arbitrary number of columns, nor arbitrary long columns. The following overview lists the most important limits. Regardless of those limits, there are indexes that cover an entire table, as we see in the next section.

This entry was posted in SQL Server and tagged . Bookmark the permalink.