Optimize Sql That Uses Between Clause
Solution 1:
You may want to try something like this
Select A.ID,
(SELECT B.IDFROMB
WHERE A.EventTime BETWEEN B.start_time AND B.end_time LIMIT 1) AS B_ID
FROMA
If you have an index on the Start_Time,End_Time fields for B, then this should work quite well.
Solution 2:
I'm not sure this can be optimized fully. I tried it on MySQL 5.1.30. I also added an index on {B.start_time, B.end_time}
as suggested by other folks. Then I got a report from EXPLAIN
, but the best I could get is a Range Access Method:
EXPLAIN SELECT A.id, B.id FROM A JOIN B
ON A.event_time BETWEEN B.start_time AND B.end_time;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+| id | select_type |table| type | possible_keys | key | key_len |ref|rows| Extra |+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+|1| SIMPLE | A |ALL| event_time |NULL|NULL|NULL|8|||1| SIMPLE | B |ALL| start_time |NULL|NULL|NULL|96|Range checked foreach record (index map: 0x4) |+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
See the note on the far right. The optimizer thinks it might be able to use the index on {B.start_time, B.end_time}
but it ended up deciding not to use that index. Your results may vary, because your data distribution is more representative.
Compare with the index usage if you compare A.event_time
to a constant range:
EXPLAIN SELECT A.id FROM A
WHERE A.event_time BETWEEN'2009-02-17 09:00'and'2009-02-17 10:00';
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+| id | select_type |table| type | possible_keys | key | key_len |ref|rows| Extra |+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+|1| SIMPLE | A |range| event_time | event_time |8|NULL|1|Usingwhere|+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
And compare with the dependent sub-query form given by @Luke and @Kibbee, which seems to make use of indexes more effectively:
EXPLAIN SELECT A.id AS id_from_a,
(
SELECT B.id
FROM B
WHERE A.id BETWEEN B.start_time AND B.end_time
LIMIT 0, 1
) AS id_from_b
FROM A;
+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+| id | select_type |table| type | possible_keys | key | key_len |ref|rows| Extra |+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+|1|PRIMARY| A | index |NULL|PRIMARY|8|NULL|8|Using index ||2| DEPENDENT SUBQUERY | B |ALL| start_time |NULL|NULL|NULL|384|Usingwhere|+----+--------------------+-------+-------+---------------+---------+---------+------+------+-------------+
Weirdly, EXPLAIN lists possible_keys
as NULL (i.e. no indexes could be used) but then decides to use the primary key after all. Could be an idiosyncrasy of MySQL's EXPLAIN report?
Solution 3:
I wouldn't normally recommend a query like this, but...
Since you've specified that table A only has about 980 rows and that each row maps to exactly one row in table B, then you could do the following and it will most likely be a lot faster than a cartesian join:
SELECT A.id AS id_from_a,
(
SELECT B.idFROMB
WHERE A.event_time BETWEEN B.start_time AND B.end_time
LIMIT 0, 1
) AS id_from_b
FROMA
Solution 4:
I have made some tests for a similar problem - calculating a country based on an ip address (given as a number). Here are my data and results:
- Table A (that contains users and IP addresses) contains about 20 records.
- Table B (that contains the IP ranges for each country) contains about 100000 records.
The JOIN query using "between" takes about 10 seconds; The SELECT inside a SELECT query, using "between", takes about 5.5 seconds; The SELECT inside a SELECT query, using a spatial index, takes about 6.3 seconds. The JOIN query using a spatial index takes 0 seconds!
Solution 5:
Notice that when running this query, you actually create 980x130000 records in memory before applying the condition. Such JOIN is not very recommended, and I can see why it'll give you performance issues.
Post a Comment for "Optimize Sql That Uses Between Clause"