A single index scan can only use query clauses that use the index's
columns with operators of its operator class and are joined with
AND
. For example, given an index on (a, b)
a query condition like WHERE a = 5 AND b = 6
could
use the index, but a query like WHERE a = 5 OR b = 6
could not
directly use the index.
Fortunately,
LightDB has the ability to combine multiple indexes
(including multiple uses of the same index) to handle cases that cannot
be implemented by single index scans. The system can form AND
and OR
conditions across several index scans. For example,
a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99
could be broken down into four separate scans of an index on x
,
each scan using one of the query clauses. The results of these scans are
then ORed together to produce the result. Another example is that if we
have separate indexes on x
and y
, one possible
implementation of a query like WHERE x = 5 AND y = 6
is to
use each index with the appropriate query clause and then AND together
the index results to identify the result rows.
To combine multiple indexes, the system scans each needed index and
prepares a bitmap in memory giving the locations of
table rows that are reported as matching that index's conditions.
The bitmaps are then ANDed and ORed together as needed by the query.
Finally, the actual table rows are visited and returned. The table rows
are visited in physical order, because that is how the bitmap is laid
out; this means that any ordering of the original indexes is lost, and
so a separate sort step will be needed if the query has an ORDER
BY
clause. For this reason, and because each additional index scan
adds extra time, the planner will sometimes choose to use a simple index
scan even though additional indexes are available that could have been
used as well.
In all but the simplest applications, there are various combinations of
indexes that might be useful, and the database developer must make
trade-offs to decide which indexes to provide. Sometimes multicolumn
indexes are best, but sometimes it's better to create separate indexes
and rely on the index-combination feature. For example, if your
workload includes a mix of queries that sometimes involve only column
x
, sometimes only column y
, and sometimes both
columns, you might choose to create two separate indexes on
x
and y
, relying on index combination to
process the queries that use both columns. You could also create a
multicolumn index on (x, y)
. This index would typically be
more efficient than index combination for queries involving both
columns, but as discussed in Section 11.3, it
would be almost useless for queries involving only y
, so it
should not be the only index. A combination of the multicolumn index
and a separate index on y
would serve reasonably well. For
queries involving only x
, the multicolumn index could be
used, though it would be larger and hence slower than an index on
x
alone. The last alternative is to create all three
indexes, but this is probably only reasonable if the table is searched
much more often than it is updated and all three types of query are
common. If one of the types of query is much less common than the
others, you'd probably settle for creating just the two indexes that
best match the common types.