- 关于 TiDB
- 快速上手
- 部署集群
- 数据迁移
- 数据迁移概述
- 从 MySQL 迁移至 TiDB
- 从 CSV 文件迁移至 TiDB
- 运维操作
- 监控与告警
- 故障诊断
- 性能调优
- 系统调优
- 软件调优
- SQL 性能调优
- 教程
- TiDB 生态工具
- TiDB 生态工具功能概览
- TiDB 生态工具适用场景
- TiDB 工具下载
- Backup & Restore (BR)
- TiDB Binlog
- TiDB Lightning
- TiCDC 简介
- Dumpling 使用文档
- sync-diff-inspector
- Loader 使用文档
- Mydumper 使用文档
- Syncer 使用文档
- TiSpark
- 参考指南
- 架构
- 监控指标
- 安全加固
- 权限
- SQL
- SQL 语言结构和语法
- 属性
- 字面值
- Schema 对象名
- 关键字
- 用户自定义变量
- 表达式语法
- 注释语法
- SQL 语句
- ADD COLUMN
- ADD INDEX
- ADMIN
- ALTER DATABASE
- ALTER INSTANCE
- ALTER TABLE
- ALTER USER
- ANALYZE
- BACKUP
- BEGIN
- CHANGE COLUMN
- CHANGE DRAINER
- CHANGE PUMP
- COMMIT
- CREATE [GLOBAL|SESSION] BINDING
- CREATE DATABASE
- CREATE INDEX
- CREATE ROLE
- CREATE SEQUENCE
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE USER
- CREATE VIEW
- DEALLOCATE
- DELETE
- DESC
- DESCRIBE
- DO
- DROP [GLOBAL|SESSION] BINDING
- DROP COLUMN
- DROP DATABASE
- DROP INDEX
- DROP ROLE
- DROP SEQUENCE
- DROP STATS
- DROP TABLE
- DROP USER
- DROP VIEW
- EXECUTE
- EXPLAIN ANALYZE
- EXPLAIN
- FLASHBACK TABLE
- FLUSH PRIVILEGES
- FLUSH STATUS
- FLUSH TABLES
- GRANT
- GRANT
- INSERT
- KILL [TIDB]
- LOAD STATS
- MODIFY COLUMN
- PREPARE
- RECOVER TABLE
- RENAME INDEX
- RENAME TABLE
- REPLACE
- RESTORE
- REVOKE
- REVOKE
- ROLLBACK
- SELECT
- SET DEFAULT ROLE
- SET [NAMES|CHARACTER SET]
- SET PASSWORD
- SET ROLE
- SET TRANSACTION
- SET [GLOBAL|SESSION]
- SHOW [BACKUPS|RESTORES]
- SHOW ANALYZE STATUS
- SHOW [GLOBAL|SESSION] BINDINGS
- SHOW BUILTINS
- SHOW CHARACTER SET
- SHOW COLLATION
- SHOW [FULL] COLUMNS FROM
- SHOW CONFIG
- SHOW CREATE SEQUENCE
- SHOW CREATE TABLE
- SHOW CREATE USER
- SHOW DATABASES
- SHOW DRAINER STATUS
- SHOW ENGINES
- SHOW ERRORS
- SHOW [FULL] FIELDS FROM
- SHOW GRANTS
- SHOW INDEX [FROM|IN]
- SHOW INDEXES [FROM|IN]
- SHOW KEYS [FROM|IN]
- SHOW MASTER STATUS
- SHOW PLUGINS
- SHOW PRIVILEGES
- SHOW [FULL] PROCESSLIST
- SHOW PROFILES
- SHOW PUMP STATUS
- SHOW SCHEMAS
- SHOW STATS_HEALTHY
- SHOW STATS_HISTOGRAMS
- SHOW STATS_META
- SHOW [GLOBAL|SESSION] STATUS
- SHOW TABLE NEXTROWID
- SHOW TABLE REGIONS
- SHOW TABLE STATUS
- SHOW [FULL] TABLES
- SHOW [GLOBAL|SESSION] VARIABLES
- SHOW WARNINGS
- SHUTDOWN
- Split Region 使用文档
- START TRANSACTION
- TRACE
- TRUNCATE
- UPDATE
- USE
- 数据类型
- 函数与操作符
- 约束
- 生成列
- SQL 模式
- 事务
- 垃圾回收 (GC)
- 视图
- 分区表
- 字符集和排序规则
- 系统表
- TiDB 系统表
- INFORMATION_SCHEMA
- TiDB 简介
- ANALYZE_STATUS
- CHARACTER_SETS
- CLUSTER_CONFIG
- CLUSTER_HARDWARE
- CLUSTER_INFO
- CLUSTER_LOAD
- CLUSTER_LOG
- CLUSTER_SYSTEMINFO
- COLLATIONS
- COLLATIONCHARACTERSET_APPLICABILITY
- COLUMNS
- DDL_JOBS
- ENGINES
- INSPECTION_RESULT
- INSPECTION_RULES
- INSPECTION_SUMMARY
- KEYCOLUMNUSAGE
- METRICS_SUMMARY
- METRICS_TABLES
- PARTITIONS
- PROCESSLIST
- SCHEMATA
- SEQUENCES
- SESSION_VARIABLES
- SLOW_QUERY
- STATISTICS
- TABLES
- TABLE_CONSTRAINTS
- TABLESTORAGESTATS
- TIDBHOTREGIONS
- TIDB_INDEXES
- TIDBSERVERSINFO
- TIFLASH_REPLICA
- TIKVREGIONPEERS
- TIKVREGIONSTATUS
- TIKVSTORESTATUS
- USER_PRIVILEGES
- VIEWS
- Metrics Schema
- SQL 语言结构和语法
- UI
- CLI
- 命令行参数
- 配置文件参数
- 系统变量
- 存储引擎
- TiUP
- 遥测
- 错误码与故障诊断
- TiCDC Open Protocol
- 通过拓扑 label 进行副本调度
- 常见问题解答 (FAQ)
- 术语表
Metrics Schema
METRICS_SCHEMA
是基于 Prometheus 中 TiDB 监控指标的一组视图。每个表的 PromQL(Prometheus 查询语言)的源均可在 INFORMATION_SCHEMA.METRICS_TABLES
表中找到。
use metrics_schema;
SELECT * FROM uptime;
SELECT * FROM information_schema.metrics_tables WHERE table_name='uptime'\G
+----------------------------+-----------------+------------+--------------------+
| time | instance | job | value |
+----------------------------+-----------------+------------+--------------------+
| 2020-07-06 15:26:26.203000 | 127.0.0.1:10080 | tidb | 123.60300016403198 |
| 2020-07-06 15:27:26.203000 | 127.0.0.1:10080 | tidb | 183.60300016403198 |
| 2020-07-06 15:26:26.203000 | 127.0.0.1:20180 | tikv | 123.60300016403198 |
| 2020-07-06 15:27:26.203000 | 127.0.0.1:20180 | tikv | 183.60300016403198 |
| 2020-07-06 15:26:26.203000 | 127.0.0.1:2379 | pd | 123.60300016403198 |
| 2020-07-06 15:27:26.203000 | 127.0.0.1:2379 | pd | 183.60300016403198 |
| 2020-07-06 15:26:26.203000 | 127.0.0.1:9090 | prometheus | 123.72300004959106 |
| 2020-07-06 15:27:26.203000 | 127.0.0.1:9090 | prometheus | 183.72300004959106 |
+----------------------------+-----------------+------------+--------------------+
8 rows in set (0.00 sec)
*************************** 1. row ***************************
TABLE_NAME: uptime
PROMQL: (time() - process_start_time_seconds{$LABEL_CONDITIONS})
LABELS: instance,job
QUANTILE: 0
COMMENT: TiDB uptime since last restart(second)
1 row in set (0.00 sec)
show tables;
+---------------------------------------------------+
| Tables_in_metrics_schema |
+---------------------------------------------------+
| abnormal_stores |
| etcd_disk_wal_fsync_rate |
| etcd_wal_fsync_duration |
| etcd_wal_fsync_total_count |
| etcd_wal_fsync_total_time |
| go_gc_count |
| go_gc_cpu_usage |
| go_gc_duration |
| go_heap_mem_usage |
| go_threads |
| goroutines_count |
| node_cpu_usage |
| node_disk_available_size |
| node_disk_io_util |
| node_disk_iops |
| node_disk_read_latency |
| node_disk_size |
..
| tikv_storage_async_request_total_time |
| tikv_storage_async_requests |
| tikv_storage_async_requests_total_count |
| tikv_storage_command_ops |
| tikv_store_size |
| tikv_thread_cpu |
| tikv_thread_nonvoluntary_context_switches |
| tikv_thread_voluntary_context_switches |
| tikv_threads_io |
| tikv_threads_state |
| tikv_total_keys |
| tikv_wal_sync_duration |
| tikv_wal_sync_max_duration |
| tikv_worker_handled_tasks |
| tikv_worker_handled_tasks_total_num |
| tikv_worker_pending_tasks |
| tikv_worker_pending_tasks_total_num |
| tikv_write_stall_avg_duration |
| tikv_write_stall_max_duration |
| tikv_write_stall_reason |
| up |
| uptime |
+---------------------------------------------------+
626 rows in set (0.00 sec)
METRICS_SCHEMA
是监控相关的 summary 表的数据源,例如 metrics_summary
、metrics_summary_by_label
和 inspection_summary
。
更多例子
下面以 metrics_schema
中的 tidb_query_duration
监控表为例,介绍监控表相关的使用和原理,其他的监控表原理均类似。
查询 information_schema.metrics_tables
中关于 tidb_query_duration
表相关的信息如下:
select * from information_schema.metrics_tables where table_name='tidb_query_duration';
+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+----------+----------------------------------------------+
| TABLE_NAME | PROMQL | LABELS | QUANTILE | COMMENT |
+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+----------+----------------------------------------------+
| tidb_query_duration | histogram_quantile($QUANTILE, sum(rate(tidb_server_handle_query_duration_seconds_bucket{$LABEL_CONDITIONS}[$RANGE_DURATION])) by (le,sql_type,instance)) | instance,sql_type | 0.9 | The quantile of TiDB query durations(second) |
+---------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+----------+----------------------------------------------+
TABLE_NAME
:对应于metrics_schema
中的表名,这里表名是tidb_query_duration
。PROMQL
:因为监控表的原理是将 SQL 映射成PromQL
后向 Prometheus 请求数据,并将 Prometheus 返回的结果转换成 SQL 查询结果。该字段是PromQL
的表达式模板,查询监控表数据时使用查询条件改写模板中的变量,生成最终的查询表达式。LABELS
:监控项定义的 label,tidb_query_duration
有两个 label,分别是instance
和sql_type
。QUANTILE
:百分位。直方图类型的监控数据会指定一个默认百分位。如果值为0
,表示该监控表对应的监控不是直方图。tidb_query_duration
默认查询 0.9 ,也就是 P90 的监控值。COMMENT
:对这个监控表的解释。可以看出tidb_query_duration
表是用来查询 TiDB query 执行的百分位时间,如 P999/P99/P90 的查询耗时,单位是秒。
再来看 tidb_query_duration
的表结构:
show create table metrics_schema.tidb_query_duration;
+---------------------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------------------+--------------------------------------------------------------------------------------------------------------------+
| tidb_query_duration | CREATE TABLE `tidb_query_duration` ( |
| | `time` datetime unsigned DEFAULT CURRENT_TIMESTAMP, |
| | `instance` varchar(512) DEFAULT NULL, |
| | `sql_type` varchar(512) DEFAULT NULL, |
| | `quantile` double unsigned DEFAULT '0.9', |
| | `value` double unsigned DEFAULT NULL |
| | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='The quantile of TiDB query durations(second)' |
+---------------------+--------------------------------------------------------------------------------------------------------------------+
time
:监控项的时间。instance
和sql_type
:是tidb_query_duration
这个监控项的 label。instance
表示监控的地址,sql_type
表示执行 SQL 的类似。quantile
,百分位,直方图类型的监控都会有该列,表示查询的百分位时间,如quantile=0.9
就是查询 P90 的时间。value
:监控项的值。
下面是查询时间 [2020-03-25 23:40:00
, 2020-03-25 23:42:00
] 范围内的 P99 的 TiDB Query 耗时:
select * from metrics_schema.tidb_query_duration where value is not null and time>='2020-03-25 23:40:00' and time <= '2020-03-25 23:42:00' and quantile=0.99;
+---------------------+-------------------+----------+----------+----------------+
| time | instance | sql_type | quantile | value |
+---------------------+-------------------+----------+----------+----------------+
| 2020-03-25 23:40:00 | 172.16.5.40:10089 | Insert | 0.99 | 0.509929485256 |
| 2020-03-25 23:41:00 | 172.16.5.40:10089 | Insert | 0.99 | 0.494690793986 |
| 2020-03-25 23:42:00 | 172.16.5.40:10089 | Insert | 0.99 | 0.493460506934 |
| 2020-03-25 23:40:00 | 172.16.5.40:10089 | Select | 0.99 | 0.152058493415 |
| 2020-03-25 23:41:00 | 172.16.5.40:10089 | Select | 0.99 | 0.152193879678 |
| 2020-03-25 23:42:00 | 172.16.5.40:10089 | Select | 0.99 | 0.140498483232 |
| 2020-03-25 23:40:00 | 172.16.5.40:10089 | internal | 0.99 | 0.47104 |
| 2020-03-25 23:41:00 | 172.16.5.40:10089 | internal | 0.99 | 0.11776 |
| 2020-03-25 23:42:00 | 172.16.5.40:10089 | internal | 0.99 | 0.11776 |
+---------------------+-------------------+----------+----------+----------------+
以上查询结果的第一行意思是,在 2020-03-25 23:40:00
时,在 TiDB 实例 172.16.5.40:10089
上,Insert
类型的语句的 P99 执行时间是 0.509929485256 秒。其他各行的含义类似,sql_type
列的其他值含义如下:
Select
:表示执行的select
类型的语句。internal
:表示 TiDB 的内部 SQL 语句,一般是统计信息更新,获取全局变量相关的内部语句。
进一步再查看上面语句的执行计划如下:
desc select * from metrics_schema.tidb_query_duration where value is not null and time>='2020-03-25 23:40:00' and time <= '2020-03-25 23:42:00' and quantile=0.99;
+------------------+----------+------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------+----------+------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Selection_5 | 8000.00 | root | | not(isnull(Column#5)) |
| └─MemTableScan_6 | 10000.00 | root | table:tidb_query_duration | PromQL:histogram_quantile(0.99, sum(rate(tidb_server_handle_query_duration_seconds_bucket{}[60s])) by (le,sql_type,instance)), start_time:2020-03-25 23:40:00, end_time:2020-03-25 23:42:00, step:1m0s |
+------------------+----------+------+---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
可以发现执行计划中有一个 PromQL
, 以及查询监控的 start_time
和 end_time
,还有 step
值,在实际执行时,TiDB 会调用 Prometheus 的 query_range
HTTP API 接口来查询监控数据。
从以上结果可知,在 [2020-03-25 23:40:00
, 2020-03-25 23:42:00
] 时间范围内,每个 label 只有三个时间的值,间隔时间是 1 分钟,即执行计划中的 step
值。该间隔时间由以下两个 session 变量决定:
tidb_metric_query_step
:查询的分辨率步长。从 Prometheus 的query_range
接口查询数据时需要指定start_time
,end_time
和step
,其中step
会使用该变量的值。tidb_metric_query_range_duration
:查询监控时,会将PROMQL
中的$RANGE_DURATION
替换成该变量的值,默认值是 60 秒。
如果想要查看不同时间粒度的监控项的值,用户可以修改上面两个 session 变量后查询监控表,示例如下:
首先修改两个 session 变量的值,将时间粒度设置为 30 秒。
注意:
Prometheus 支持查询的最小粒度为 30 秒。
set @@tidb_metric_query_step=30;
set @@tidb_metric_query_range_duration=30;
再查询 tidb_query_duration
监控如下,可以发现在三分钟时间范围内,每个 label 有六个时间的值,每个值时间间隔是 30 秒。
select * from metrics_schema.tidb_query_duration where value is not null and time>='2020-03-25 23:40:00' and time <= '2020-03-25 23:42:00' and quantile=0.99;
+---------------------+-------------------+----------+----------+-----------------+
| time | instance | sql_type | quantile | value |
+---------------------+-------------------+----------+----------+-----------------+
| 2020-03-25 23:40:00 | 172.16.5.40:10089 | Insert | 0.99 | 0.483285651924 |
| 2020-03-25 23:40:30 | 172.16.5.40:10089 | Insert | 0.99 | 0.484151462113 |
| 2020-03-25 23:41:00 | 172.16.5.40:10089 | Insert | 0.99 | 0.504576 |
| 2020-03-25 23:41:30 | 172.16.5.40:10089 | Insert | 0.99 | 0.493577384561 |
| 2020-03-25 23:42:00 | 172.16.5.40:10089 | Insert | 0.99 | 0.49482474311 |
| 2020-03-25 23:40:00 | 172.16.5.40:10089 | Select | 0.99 | 0.189253402185 |
| 2020-03-25 23:40:30 | 172.16.5.40:10089 | Select | 0.99 | 0.184224951851 |
| 2020-03-25 23:41:00 | 172.16.5.40:10089 | Select | 0.99 | 0.151673410553 |
| 2020-03-25 23:41:30 | 172.16.5.40:10089 | Select | 0.99 | 0.127953838989 |
| 2020-03-25 23:42:00 | 172.16.5.40:10089 | Select | 0.99 | 0.127455434547 |
| 2020-03-25 23:40:00 | 172.16.5.40:10089 | internal | 0.99 | 0.0624 |
| 2020-03-25 23:40:30 | 172.16.5.40:10089 | internal | 0.99 | 0.12416 |
| 2020-03-25 23:41:00 | 172.16.5.40:10089 | internal | 0.99 | 0.0304 |
| 2020-03-25 23:41:30 | 172.16.5.40:10089 | internal | 0.99 | 0.06272 |
| 2020-03-25 23:42:00 | 172.16.5.40:10089 | internal | 0.99 | 0.0629333333333 |
+---------------------+-------------------+----------+----------+-----------------+
最后查看执行计划,也会发现执行计划中的 PromQL
以及 step
的值都已经变成了 30 秒。
desc select * from metrics_schema.tidb_query_duration where value is not null and time>='2020-03-25 23:40:00' and time <= '2020-03-25 23:42:00' and quantile=0.99;
+------------------+----------+------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------+----------+------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Selection_5 | 8000.00 | root | | not(isnull(Column#5)) |
| └─MemTableScan_6 | 10000.00 | root | table:tidb_query_duration | PromQL:histogram_quantile(0.99, sum(rate(tidb_server_handle_query_duration_seconds_bucket{}[30s])) by (le,sql_type,instance)), start_time:2020-03-25 23:40:00, end_time:2020-03-25 23:42:00, step:30s |
+------------------+----------+------+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论