mysql分组后返回每组最新的一条数据

问题背景

有张monitor表,存了所有设备的资源使用信息(比如CPU、内存、磁盘占用,用itemMonitor字段区分)。每分钟存一条,有300w条记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
+--------------+-------------+------------+--------------+---------------------+
| indexMonitor | itemMonitor | numMonitor | ip | collectTime |
+--------------+-------------+------------+--------------+---------------------+
| 19656428 | 2 | 0.14 | 172.16.0.168 | 2021-09-15 19:07:01 |
| 19656419 | 2 | 0.14 | 172.16.0.236 | 2021-09-15 19:07:01 |
| 19656473 | 2 | 0.2 | 172.16.0.11 | 2021-09-15 19:07:01 |
| 19656465 | 2 | 0.34 | 172.16.0.226 | 2021-09-15 19:07:01 |
| 19656417 | 2 | 0.14 | 172.16.0.33 | 2021-09-15 19:07:01 |
| 19656457 | 2 | 0.18 | 172.16.0.123 | 2021-09-15 19:07:01 |
| 19656432 | 2 | 0.25 | 172.16.0.181 | 2021-09-15 19:07:01 |
| 19656486 | 2 | 0.23 | 172.16.0.112 | 2021-09-15 19:07:01 |
| 19656430 | 2 | 0.31 | 172.16.0.252 | 2021-09-15 19:07:01 |
| 19656463 | 2 | 0.28 | 172.16.0.105 | 2021-09-15 19:07:01 |
| 19656415 | 2 | 0.17 | 172.16.0.206 | 2021-09-15 19:07:01 |
| 19656523 | 2 | 0.33 | 172.16.0.99 | 2021-09-15 19:07:01 |
| 19656462 | 2 | 0.19 | 172.16.0.39 | 2021-09-15 19:07:01 |
| 19656443 | 2 | 0.17 | 172.16.0.84 | 2021-09-15 19:07:01 |
| 19656414 | 2 | 0.17 | 172.16.0.84 | 2021-09-15 19:07:01 |
| 19656477 | 2 | 0.2 | 172.16.0.195 | 2021-09-15 19:07:01 |
| 19656418 | 2 | 0.14 | 172.16.0.28 | 2021-09-15 19:07:01 |
| 19656445 | 2 | 0.24 | 172.16.0.104 | 2021-09-15 19:07:01 |
| 19656421 | 2 | 0.14 | 172.16.0.200 | 2021-09-15 19:07:01 |
| 19656478 | 2 | 0.14 | 172.16.0.168 | 2021-09-15 19:07:01 |
| 19656442 | 2 | 0.34 | 172.16.0.252 | 2021-09-15 19:07:01 |
| 19656405 | 2 | 0.32 | 172.16.0.101 | 2021-09-15 19:07:01 |
| 19656471 | 2 | 0.15 | 172.16.0.151 | 2021-09-15 19:07:01 |
| 19656456 | 2 | 0.14 | 172.16.0.181 | 2021-09-15 19:07:01 |
| 19656309 | 2 | 0.31 | 172.16.0.206 | 2021-09-15 19:06:01 |
| 19656306 | 2 | 0.28 | 172.16.0.99 | 2021-09-15 19:06:01 |
| 19656310 | 2 | 0.24 | 172.16.0.104 | 2021-09-15 19:06:01 |
| 19656312 | 2 | 0.25 | 172.16.0.101 | 2021-09-15 19:06:01 |
| 19656311 | 2 | 0.27 | 172.16.0.112 | 2021-09-15 19:06:01 |
| 19656313 | 2 | 0.21 | 172.16.0.84 | 2021-09-15 19:06:01 |
+--------------+-------------+------------+--------------+---------------------+

需要查询每台设备最新的CPU占用情况(对应的itemMonitor = 2)。

之前是for循环根据ip去查:

1
2
3
4
5
6
7
mysql> select * from monitor where itemMonitor=2  and ip = "172.16.0.168" order by collectTime desc limit 0,1;
+--------------+-------------+------------+--------------+---------------------+
| indexMonitor | itemMonitor | numMonitor | ip | collectTime |
+--------------+-------------+------------+--------------+---------------------+
| 19657001 | 2 | 0.14 | 172.16.0.168 | 2021-09-15 19:12:01 |
+--------------+-------------+------------+--------------+---------------------+
1 row in set (0.21 sec)

20台设备需要20*0.21=4.2s,造成接口很卡。需要改为批量查询。


解决方案

分组

先根据IP分组,每组返回一条数据,就是所有设备的CPU信息。

语句

1
select * from monitor where itemMonitor=2 group by ip;

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
+--------------+-------------+------------+--------------+---------------------+
| indexMonitor | itemMonitor | numMonitor | ip | collectTime |
+--------------+-------------+------------+--------------+---------------------+
| 18990928 | 2 | 0.14 | 172.16.0.101 | 2021-09-12 00:00:01 |
| 18990946 | 2 | 0.29 | 172.16.0.104 | 2021-09-12 00:00:01 |
| 18990969 | 2 | 0.25 | 172.16.0.105 | 2021-09-12 00:00:01 |
| 18990954 | 2 | 0.13 | 172.16.0.11 | 2021-09-12 00:00:01 |
| 18990943 | 2 | 0.13 | 172.16.0.112 | 2021-09-12 00:00:01 |
| 18990998 | 2 | 0.13 | 172.16.0.123 | 2021-09-12 00:00:01 |
| 18991023 | 2 | 0.13 | 172.16.0.151 | 2021-09-12 00:00:01 |
| 18990953 | 2 | 0.14 | 172.16.0.168 | 2021-09-12 00:00:01 |
| 18991015 | 2 | 0.13 | 172.16.0.181 | 2021-09-12 00:00:01 |
| 18990978 | 2 | 0.25 | 172.16.0.195 | 2021-09-12 00:00:01 |
| 18990947 | 2 | 0.13 | 172.16.0.200 | 2021-09-12 00:00:01 |
| 18990959 | 2 | 0.14 | 172.16.0.206 | 2021-09-12 00:00:01 |
| 18991011 | 2 | 0.13 | 172.16.0.226 | 2021-09-12 00:00:01 |
| 18990965 | 2 | 0.33 | 172.16.0.236 | 2021-09-12 00:00:01 |
| 18990945 | 2 | 0.13 | 172.16.0.252 | 2021-09-12 00:00:01 |
| 18990996 | 2 | 0.23 | 172.16.0.28 | 2021-09-12 00:00:01 |
| 18991007 | 2 | 0.14 | 172.16.0.33 | 2021-09-12 00:00:01 |
| 18990944 | 2 | 0.14 | 172.16.0.39 | 2021-09-12 00:00:01 |
| 18990932 | 2 | 0.13 | 172.16.0.84 | 2021-09-12 00:00:01 |
| 18990952 | 2 | 0.31 | 172.16.0.99 | 2021-09-12 00:00:01 |
+--------------+-------------+------------+--------------+---------------------+
20 rows in set (0.29 sec)

但这样查询到的数据是最老的。

排序

采用子查询,在分组前先倒序排序

语句

1
select * from (select * from monitor where itemMonitor = 2 order by collectTime desc) temp_monitor group by temp_monitor.ip;

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
+--------------+-------------+------------+--------------+---------------------+
| indexMonitor | itemMonitor | numMonitor | ip | collectTime |
+--------------+-------------+------------+--------------+---------------------+
| 18990928 | 2 | 0.14 | 172.16.0.101 | 2021-09-12 00:00:01 |
| 18990946 | 2 | 0.29 | 172.16.0.104 | 2021-09-12 00:00:01 |
| 18990969 | 2 | 0.25 | 172.16.0.105 | 2021-09-12 00:00:01 |
| 18990954 | 2 | 0.13 | 172.16.0.11 | 2021-09-12 00:00:01 |
| 18990943 | 2 | 0.13 | 172.16.0.112 | 2021-09-12 00:00:01 |
| 18990998 | 2 | 0.13 | 172.16.0.123 | 2021-09-12 00:00:01 |
| 18991023 | 2 | 0.13 | 172.16.0.151 | 2021-09-12 00:00:01 |
| 18990953 | 2 | 0.14 | 172.16.0.168 | 2021-09-12 00:00:01 |
| 18991015 | 2 | 0.13 | 172.16.0.181 | 2021-09-12 00:00:01 |
| 18990978 | 2 | 0.25 | 172.16.0.195 | 2021-09-12 00:00:01 |
| 18990947 | 2 | 0.13 | 172.16.0.200 | 2021-09-12 00:00:01 |
| 18990959 | 2 | 0.14 | 172.16.0.206 | 2021-09-12 00:00:01 |
| 18991011 | 2 | 0.13 | 172.16.0.226 | 2021-09-12 00:00:01 |
| 18990965 | 2 | 0.33 | 172.16.0.236 | 2021-09-12 00:00:01 |
| 18990945 | 2 | 0.13 | 172.16.0.252 | 2021-09-12 00:00:01 |
| 18990996 | 2 | 0.23 | 172.16.0.28 | 2021-09-12 00:00:01 |
| 18991007 | 2 | 0.14 | 172.16.0.33 | 2021-09-12 00:00:01 |
| 18990944 | 2 | 0.14 | 172.16.0.39 | 2021-09-12 00:00:01 |
| 18990932 | 2 | 0.13 | 172.16.0.84 | 2021-09-12 00:00:01 |
| 18990952 | 2 | 0.31 | 172.16.0.99 | 2021-09-12 00:00:01 |
+--------------+-------------+------------+--------------+---------------------+
20 rows in set (0.30 sec)

奇怪,怎么没有变化?原来

从mysql 5.7开始,就做了优化:子查询中的排序被认为无效,会自动忽略

而我们版本是5.7.32

查询部分

在子查询中加上limit,查询最新的部分数据,可以让优化失效

语句

1
select * from (select * from monitor where itemMonitor = 2 order by collectTime desc limit 0,9999) temp_monitor group by temp_monitor.ip;

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
+--------------+-------------+------------+--------------+---------------------+
| indexMonitor | itemMonitor | numMonitor | ip | collectTime |
+--------------+-------------+------------+--------------+---------------------+
| 19658044 | 2 | 0.13 | 172.16.0.101 | 2021-09-15 19:21:01 |
| 19658056 | 2 | 0.13 | 172.16.0.104 | 2021-09-15 19:21:01 |
| 19658083 | 2 | 0.14 | 172.16.0.105 | 2021-09-15 19:21:01 |
| 19658055 | 2 | 0.13 | 172.16.0.11 | 2021-09-15 19:21:01 |
| 19658084 | 2 | 0.13 | 172.16.0.112 | 2021-09-15 19:21:01 |
| 19658077 | 2 | 0.13 | 172.16.0.123 | 2021-09-15 19:21:01 |
| 19658080 | 2 | 0.13 | 172.16.0.151 | 2021-09-15 19:21:01 |
| 19658091 | 2 | 0.15 | 172.16.0.168 | 2021-09-15 19:21:01 |
| 19658103 | 2 | 0.13 | 172.16.0.181 | 2021-09-15 19:21:01 |
| 19658107 | 2 | 0.14 | 172.16.0.195 | 2021-09-15 19:21:01 |
| 19658065 | 2 | 0.13 | 172.16.0.200 | 2021-09-15 19:21:01 |
| 19658062 | 2 | 0.14 | 172.16.0.206 | 2021-09-15 19:21:01 |
| 19658078 | 2 | 0.13 | 172.16.0.226 | 2021-09-15 19:21:01 |
| 19658076 | 2 | 0.14 | 172.16.0.236 | 2021-09-15 19:21:01 |
| 19658060 | 2 | 0.13 | 172.16.0.252 | 2021-09-15 19:21:01 |
| 19658079 | 2 | 0.14 | 172.16.0.28 | 2021-09-15 19:21:01 |
| 19658058 | 2 | 0.14 | 172.16.0.33 | 2021-09-15 19:21:01 |
| 19658061 | 2 | 0.14 | 172.16.0.39 | 2021-09-15 19:21:01 |
| 19658054 | 2 | 0.13 | 172.16.0.84 | 2021-09-15 19:21:01 |
| 19658059 | 2 | 0.14 | 172.16.0.99 | 2021-09-15 19:21:01 |
+--------------+-------------+------------+--------------+---------------------+
20 rows in set (0.24 sec)

注意

为了结果准确,limit后的行数最好要覆盖全表。但由于这张表的数据是按时间插入的,最新的n(n<=设备数)条数据ip必然不同,所以行数(这里设10000)大于设备数量就可以了。

文章作者: SongGT
文章链接: http://www.songguangtao.xyz/2022/07/28/1.分组后返回每组最新的一条数据/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 SongGuangtao's Blog
大哥大嫂[微信打赏]
过年好[支付宝打赏]