Optimize Sql That Uses Between Clause

November 15, 2024 Post a Comment

Consider the following 2 tables: Table A: id event_time Table B id start_time end_time Every record in table A is mapped to exactly 1 record in table B. This means table B has no

Solution 1:

You may want to try something like this

Select A.ID,
(SELECT B.IDFROMB
WHERE A.EventTime BETWEEN B.start_time AND B.end_time LIMIT 1) AS B_ID
FROMA

If you have an index on the Start_Time,End_Time fields for B, then this should work quite well.

Solution 2:

I'm not sure this can be optimized fully. I tried it on MySQL 5.1.30. I also added an index on {B.start_time, B.end_time} as suggested by other folks. Then I got a report from EXPLAIN, but the best I could get is a Range Access Method:

EXPLAIN SELECT A.id, B.id FROM A JOIN B 
ON A.event_time BETWEEN B.start_time AND B.end_time;

+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+| id | select_type |table| type | possible_keys | key  | key_len |ref|rows| Extra                                          |+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+|1| SIMPLE      | A     |ALL| event_time    |NULL|NULL|NULL|8|||1| SIMPLE      | B     |ALL| start_time    |NULL|NULL|NULL|96|Range checked foreach record (index map: 0x4) |+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+

See the note on the far right. The optimizer thinks it might be able to use the index on {B.start_time, B.end_time} but it ended up deciding not to use that index. Your results may vary, because your data distribution is more representative.

Compare with the index usage if you compare A.event_time to a constant range:

EXPLAIN SELECT A.id FROM A
WHERE A.event_time BETWEEN'2009-02-17 09:00'and'2009-02-17 10:00';

+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+| id | select_type |table| type  | possible_keys | key        | key_len |ref|rows| Extra       |+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+|1| SIMPLE      | A     |range| event_time    | event_time |8|NULL|1|Usingwhere|+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+

And compare with the dependent sub-query form given by @Luke and @Kibbee, which seems to make use of indexes more effectively:

EXPLAIN SELECT A.id AS id_from_a,
    (
        SELECT B.id
        FROM B
        WHERE A.id BETWEEN B.start_time AND B.end_time
        LIMIT 0, 1
    ) AS id_from_b
FROM A;

+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+| id | select_type        |table| type  | possible_keys | key     | key_len |ref|rows| Extra       |+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+|1|PRIMARY| A     | index |NULL|PRIMARY|8|NULL|8|Using index ||2| DEPENDENT SUBQUERY | B     |ALL| start_time    |NULL|NULL|NULL|384|Usingwhere|+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+

Weirdly, EXPLAIN lists possible_keys as NULL (i.e. no indexes could be used) but then decides to use the primary key after all. Could be an idiosyncrasy of MySQL's EXPLAIN report?

Solution 3:

I wouldn't normally recommend a query like this, but...

Since you've specified that table A only has about 980 rows and that each row maps to exactly one row in table B, then you could do the following and it will most likely be a lot faster than a cartesian join:

SELECT A.id AS id_from_a,
    (
        SELECT B.idFROMB
        WHERE A.event_time BETWEEN B.start_time AND B.end_time
        LIMIT 0, 1
    ) AS id_from_b
FROMA

Solution 4:

I have made some tests for a similar problem - calculating a country based on an ip address (given as a number). Here are my data and results:

Table A (that contains users and IP addresses) contains about 20 records.
Table B (that contains the IP ranges for each country) contains about 100000 records.

The JOIN query using "between" takes about 10 seconds; The SELECT inside a SELECT query, using "between", takes about 5.5 seconds; The SELECT inside a SELECT query, using a spatial index, takes about 6.3 seconds. The JOIN query using a spatial index takes 0 seconds!

Solution 5:

Notice that when running this query, you actually create 980x130000 records in memory before applying the condition. Such JOIN is not very recommended, and I can see why it'll give you performance issues.

comprasconencanto1