下载
编译与部署
开始使用
操作手册
- 数据导入
- 表结构变更
- 物化视图
- HTTP API
- FE
- MANAGER
- Bootstrap Action
- Cancel Load Action
- Check Decommission Action
- Check Storage Type Action
- Config Action
- Connection Action
- Get DDL Statement Action
- Get Load Info Action
- Get Load State
- Get FE log file
- Get Small File
- HA Action
- Hardware Info Action
- Health Action
- Log Action
- Logout Action
- Meta Action
- Meta Action
- Meta Replay State Action
- Profile Action
- Query Detail Action
- Query Profile Action
- Row Count Action
- Session Action
- Set Config Action
- Show Data Action
- Show Meta Info Action
- Show Proc Action
- Show Runtime Info Action
- Statement Execution Action
- System Action
- Table Query Plan Action
- Table Row Count Action
- Table Schema Action
- Upload Action
- CANCEL LABEL
- Compaction Action
- CONNECTION
- getlogfile
- GET LABEL STATE
- GET TABLETS ON A PARTICULAR BE
- PROFILE
- QUERY DETAIL
- RESTORE TABLET
- SHOW DATA
- MIGRATE SINGLE TABLET TO A PARTICULAR DISK
- GET TABLETS DISTRIBUTION BETWEEN DIFFERENT DISKS
- FE
- 运维操作
- 配置文件
- 拦截规则
- 备份与恢复
- Broker
- Colocation Join
- Bucket Shuffle Join
- 动态分区
- 数据导出
- 导出查询结果集
- 分区缓存
- 权限管理
- LDAP
- 资源管理
- 查询执行的统计
- Runtime Filter
- Segment V2 升级手册
- 文件管理器
- SQL MODE
- 时区
- 变量
- 更新
- 多租户和资源划分
最佳实践
扩展功能
- 审计日志插件
- Doris On ES
- Doris output plugin
- ODBC External Table Of Doris
- Doris 插件框架
- Spark Doris Connector
- Flink Doris Connector
- DataX doriswriter
- UDF
设计文档
SQL 手册
- SQL 函数
- 日期函数
- convert_tz
- curdate
- current_timestamp
- curtime,current_time
- date_add
- date_format
- date_sub
- datediff
- day
- dayname
- dayofmonth
- dayofweek
- dayofyear
- from_days
- from_unixtime
- hour
- makedate
- minute
- month
- monthname
- now
- second
- strtodate
- time_round
- timediff
- timestampadd
- timestampdiff
- to_days
- unix_timestamp
- utc_timestamp
- week
- weekofyear
- year
- yearweek
- 地理位置函数
- 字符串函数
- 聚合函数
- bitmap 函数
- Hash函数
- Doris 窗口函数使用
- CAST
- DIGITAL_MASKING
- 日期函数
- 语法帮助
- 用户账户管理
- 集群管理
- ADMIN CANCEL REPAIR
- ADMIN CLEAN TRASH
- ADMIN CHECK TABLET
- ADMIN REPAIR
- ADMIN SET CONFIG
- ADMIN SET REPLICA STATUS
- ADMIN SHOW CONFIG
- ADMIN SHOW REPLICA DISTRIBUTION
- ADMIN SHOW REPLICA STATUS
- ADMIN SHOW DATA SKEW
- ALTER CLUSTER
- ALTER SYSTEM
- CANCEL DECOMMISSION
- CREATE CLUSTER
- CREATE FILE
- DROP CLUSTER
- DROP FILE
- ENTER
- INSTALL PLUGIN
- LINK DATABASE
- MIGRATE DATABASE
- SET LDAPADMINPASSWORD
- SHOW BACKENDS
- SHOW BROKER
- SHOW FILE
- SHOW FRONTENDS
- SHOW FULL COLUMNS
- SHOW INDEX
- SHOW MIGRATIONS
- SHOW PLUGINS
- SHOW TABLE STATUS
- SHOW TRASH
- UNINSTALL PLUGIN
- DDL
- ALTER DATABASE
- ALTER TABLE
- ALTER VIEW
- BACKUP
- CANCEL ALTER
- CANCEL BACKUP
- CANCEL RESTORE
- CREATE DATABASE
- CREATE ENCRYPTKEY
- CREATE INDEX
- CREATE MATERIALIZED VIEW
- CREATE REPOSITORY
- CREATE RESOURCE
- CREATE TABLE LIKE
- CREATE TABLE
- CREATE VIEW
- DROP DATABASE
- DROP ENCRYPTKEY
- DROP INDEX
- DROP MATERIALIZED VIEW
- DROP REPOSITORY
- DROP RESOURCE
- DROP TABLE
- DROP VIEW
- HLL
- RECOVER
- RESTORE
- SHOW ENCRYPTKEYS
- SHOW RESOURCES
- TRUNCATE TABLE
- CREATE FUNCTION
- DROP FUNCTION
- SHOW FUNCTIONS
- DML
- BROKER LOAD
- CANCEL LOAD
- DELETE
- EXPORT
- GROUP BY
- LOAD
- MINI LOAD
- MULTI LOAD
- PAUSE ROUTINE LOAD
- RESUME ROUTINE LOAD
- ROUTINE LOAD
- SHOW ALTER
- SHOW BACKUP
- SHOW CREATE FUNCTION
- SHOW CREATE ROUTINE LOAD
- SHOW DATA
- SHOW DATABASES
- SHOW DELETE
- SHOW DYNAMIC PARTITION TABLES
- SHOW EXPORT
- SHOW LOAD
- SHOW PARTITIONS
- SHOW PROPERTY
- SHOW REPOSITORIES
- SHOW RESTORE
- SHOW ROUTINE LOAD TASK
- SHOW ROUTINE LOAD
- SHOW SNAPSHOT
- SHOW TABLES
- SHOW TABLET
- SHOW TRANSACTION
- SPARK LOAD
- STOP ROUTINE LOAD
- STREAM LOAD
- ALTER ROUTINE LOAD
- INSERT
- UPDATE
- 数据类型
- 辅助命令
开发者手册
- 调试工具
- Doris BE存储层Benchmark工具
- 使用 Eclipse 搭建 FE 开发环境
- 使用 IntelliJ IDEA 搭建 FE 开发环境
- Apache Doris Be 开发调试
- Java 代码格式化
- C++ 代码格式化
Apache 社区
Doris 窗口函数使用
窗口函数介绍
分析函数是一类特殊的内置函数。和聚合函数类似,分析函数也是对于多个输入行做计算得到一个数据值。不同的是,分析函数是在一个特定的窗口内对输入数据做处理,而不是按照 group by 来分组计算。每个窗口内的数据可以用 over() 从句进行排序和分组。分析函数会对结果集的每一行计算出一个单独的值,而不是每个 group by 分组计算一个值。这种灵活的方式允许用户在 select 从句中增加额外的列,给用户提供了更多的机会来对结果集进行重新组织和过滤。分析函数只能出现在 select 列表和最外层的 order by 从句中。在查询过程中,分析函数会在最后生效,就是说,在执行完 join,where 和 group by 等操作之后再执行。分析函数在金融和科学计算领域经常被使用到,用来分析趋势、计算离群值以及对大量数据进行分桶分析等。
分析函数的语法:
function(args) OVER(partition_by_clause order_by_clause [window_clause])
partition_by_clause ::= PARTITION BY expr [, expr ...]
order_by_clause ::= ORDER BY expr [ASC | DESC] [, expr [ASC | DESC] ...]
Function
目前支持的 Function 包括 AVG(), COUNT(), DENSE_RANK(), FIRST_VALUE(), LAG(), LAST_VALUE(), LEAD(), MAX(), MIN(), RANK(), ROW_NUMBER() 和 SUM()。
PARTITION BY从句
Partition By 从句和 Group By 类似。它把输入行按照指定的一列或多列分组,相同值的行会被分到一组。
ORDER BY从句
Order By从句和外层的Order By基本一致。它定义了输入行的排列顺序,如果指定了 Partition By,则 Order By 定义了每个 Partition 分组内的顺序。与外层 Order By 的唯一不同点是,OVER 从句中的 Order By n(n是正整数)相当于不做任何操作,而外层的 Order By n表示按照第n列排序。
举例:
这个例子展示了在select列表中增加一个id列,它的值是1,2,3等等,顺序按照events表中的date_and_time列排序。
SELECT
row_number() OVER (ORDER BY date_and_time) AS id,
c1, c2, c3, c4
FROM events;
Window从句
Window 从句用来为分析函数指定一个运算范围,以当前行为准,前后若干行作为分析函数运算的对象。Window 从句支持的方法有:AVG(), COUNT(), FIRST_VALUE(), LAST_VALUE() 和 SUM()。对于 MAX() 和 MIN(), window 从句可以指定开始范围 UNBOUNDED PRECEDING
语法:
ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]
举例:
假设我们有如下的股票数据,股票代码是 JDR,closing price 是每天的收盘价。
create table stock_ticker (stock_symbol string, closing_price decimal(8,2), closing_date timestamp);
...load some data...
select * from stock_ticker order by stock_symbol, closing_date
| stock_symbol | closing_price | closing_date |
|--------------|---------------|---------------------|
| JDR | 12.86 | 2014-10-02 00:00:00 |
| JDR | 12.89 | 2014-10-03 00:00:00 |
| JDR | 12.94 | 2014-10-04 00:00:00 |
| JDR | 12.55 | 2014-10-05 00:00:00 |
| JDR | 14.03 | 2014-10-06 00:00:00 |
| JDR | 14.75 | 2014-10-07 00:00:00 |
| JDR | 13.98 | 2014-10-08 00:00:00 |
这个查询使用分析函数产生 moving_average 这一列,它的值是3天的股票均价,即前一天、当前以及后一天三天的均价。第一天没有前一天的值,最后一天没有后一天的值,所以这两行只计算了两天的均值。这里 Partition By 没有起到作用,因为所有的数据都是 JDR 的数据,但如果还有其他股票信息,Partition By 会保证分析函数值作用在本 Partition 之内。
select stock_symbol, closing_date, closing_price,
avg(closing_price) over (partition by stock_symbol order by closing_date
rows between 1 preceding and 1 following) as moving_average
from stock_ticker;
| stock_symbol | closing_date | closing_price | moving_average |
|--------------|---------------------|---------------|----------------|
| JDR | 2014-10-02 00:00:00 | 12.86 | 12.87 |
| JDR | 2014-10-03 00:00:00 | 12.89 | 12.89 |
| JDR | 2014-10-04 00:00:00 | 12.94 | 12.79 |
| JDR | 2014-10-05 00:00:00 | 12.55 | 13.17 |
| JDR | 2014-10-06 00:00:00 | 14.03 | 13.77 |
| JDR | 2014-10-07 00:00:00 | 14.75 | 14.25 |
| JDR | 2014-10-08 00:00:00 | 13.98 | 14.36 |
Function使用举例
本节介绍 Doris 中可以用作分析函数的方法。
AVG()
语法:
AVG([DISTINCT | ALL] *expression*) [OVER (*analytic_clause*)]
举例:
计算当前行和它前后各一行数据的x平均值
select x, property,
avg(x) over
(
partition by property
order by x
rows between 1 preceding and 1 following
) as 'moving average'
from int_t where property in ('odd','even');
| x | property | moving average |
|----|----------|----------------|
| 2 | even | 3 |
| 4 | even | 4 |
| 6 | even | 6 |
| 8 | even | 8 |
| 10 | even | 9 |
| 1 | odd | 2 |
| 3 | odd | 3 |
| 5 | odd | 5 |
| 7 | odd | 7 |
| 9 | odd | 8 |
COUNT()
语法:
COUNT([DISTINCT | ALL] expression) [OVER (analytic_clause)]
举例:
计算从当前行到第一行x出现的次数。
select x, property,
count(x) over
(
partition by property
order by x
rows between unbounded preceding and current row
) as 'cumulative total'
from int_t where property in ('odd','even');
| x | property | cumulative count |
|----|----------|------------------|
| 2 | even | 1 |
| 4 | even | 2 |
| 6 | even | 3 |
| 8 | even | 4 |
| 10 | even | 5 |
| 1 | odd | 1 |
| 3 | odd | 2 |
| 5 | odd | 3 |
| 7 | odd | 4 |
| 9 | odd | 5 |
DENSE_RANK()
DENSE_RANK() 函数用来表示排名,与RANK()不同的是,DENSE_RANK() 不会出现空缺数字。比如,如果出现了两个并列的1,DENSE_RANK() 的第三个数仍然是2,而RANK()的第三个数是3。
语法:
DENSE_RANK() OVER(partition_by_clause order_by_clause)
举例:
下例展示了按照 property 列分组对x列排名:
select x, y, dense_rank() over(partition by x order by y) as rank from int_t;
| x | y | rank |
|----|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 3 | 1 | 1 |
| 3 | 2 | 2 |
FIRST_VALUE()
FIRST_VALUE() 返回窗口范围内的第一个值。
语法:
FIRST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])
举例:
我们有如下数据
select name, country, greeting from mail_merge;
| name | country | greeting |
|---------|---------|--------------|
| Pete | USA | Hello |
| John | USA | Hi |
| Boris | Germany | Guten tag |
| Michael | Germany | Guten morgen |
| Bjorn | Sweden | Hej |
| Mats | Sweden | Tja |
使用 FIRST_VALUE(),根据 country 分组,返回每个分组中第一个 greeting 的值:
select country, name,
first_value(greeting)
over (partition by country order by name, greeting) as greeting from mail_merge;
| country | name | greeting |
|---------|---------|-----------|
| Germany | Boris | Guten tag |
| Germany | Michael | Guten tag |
| Sweden | Bjorn | Hej |
| Sweden | Mats | Hej |
| USA | John | Hi |
| USA | Pete | Hi |
LAG()
LAG() 方法用来计算当前行向前数若干行的值。
语法:
LAG (expr, offset, default) OVER (partition_by_clause order_by_clause)
举例:
计算前一天的收盘价
select stock_symbol, closing_date, closing_price,
lag(closing_price,1, 0) over (partition by stock_symbol order by closing_date) as "yesterday closing"
from stock_ticker
order by closing_date;
| stock_symbol | closing_date | closing_price | yesterday closing |
|--------------|---------------------|---------------|-------------------|
| JDR | 2014-09-13 00:00:00 | 12.86 | 0 |
| JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 |
| JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 |
| JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 |
| JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 |
| JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 |
| JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 |
LAST_VALUE()
LAST_VALUE() 返回窗口范围内的最后一个值。与 FIRST_VALUE() 相反。
语法:
LAST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])
使用FIRST_VALUE()举例中的数据:
select country, name,
last_value(greeting)
over (partition by country order by name, greeting) as greeting
from mail_merge;
| country | name | greeting |
|---------|---------|--------------|
| Germany | Boris | Guten morgen |
| Germany | Michael | Guten morgen |
| Sweden | Bjorn | Tja |
| Sweden | Mats | Tja |
| USA | John | Hello |
| USA | Pete | Hello |
LEAD()
LEAD() 方法用来计算当前行向后数若干行的值。
语法:
LEAD (expr, offset, default]) OVER (partition_by_clause order_by_clause)
举例:
计算第二天的收盘价对比当天收盘价的走势,即第二天收盘价比当天高还是低。
select stock_symbol, closing_date, closing_price,
case
(lead(closing_price,1, 0)
over (partition by stock_symbol order by closing_date)-closing_price) > 0
when true then "higher"
when false then "flat or lower"
end as "trending"
from stock_ticker
order by closing_date;
| stock_symbol | closing_date | closing_price | trending |
|--------------|---------------------|---------------|---------------|
| JDR | 2014-09-13 00:00:00 | 12.86 | higher |
| JDR | 2014-09-14 00:00:00 | 12.89 | higher |
| JDR | 2014-09-15 00:00:00 | 12.94 | flat or lower |
| JDR | 2014-09-16 00:00:00 | 12.55 | higher |
| JDR | 2014-09-17 00:00:00 | 14.03 | higher |
| JDR | 2014-09-18 00:00:00 | 14.75 | flat or lower |
| JDR | 2014-09-19 00:00:00 | 13.98 | flat or lower |
MAX()
语法:
MAX([DISTINCT | ALL] expression) [OVER (analytic_clause)]
举例:
计算从第一行到当前行之后一行的最大值
select x, property,
max(x) over
(
order by property, x
rows between unbounded preceding and 1 following
) as 'local maximum'
from int_t where property in ('prime','square');
| x | property | local maximum |
|---|----------|---------------|
| 2 | prime | 3 |
| 3 | prime | 5 |
| 5 | prime | 7 |
| 7 | prime | 7 |
| 1 | square | 7 |
| 4 | square | 9 |
| 9 | square | 9 |
MIN()
语法:
MIN([DISTINCT | ALL] expression) [OVER (analytic_clause)]
举例:
计算从第一行到当前行之后一行的最小值
select x, property,
min(x) over
(
order by property, x desc
rows between unbounded preceding and 1 following
) as 'local minimum'
from int_t where property in ('prime','square');
| x | property | local minimum |
|---|----------|---------------|
| 7 | prime | 5 |
| 5 | prime | 3 |
| 3 | prime | 2 |
| 2 | prime | 2 |
| 9 | square | 2 |
| 4 | square | 1 |
| 1 | square | 1 |
RANK()
RANK() 函数用来表示排名,与 DENSE_RANK() 不同的是,RANK() 会出现空缺数字。比如,如果出现了两个并列的1, RANK() 的第三个数就是3,而不是2。
语法:
RANK() OVER(partition_by_clause order_by_clause)
举例:
根据 x 进行排名
select x, y, rank() over(partition by x order by y) as rank from int_t;
| x | y | rank |
|----|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 3 | 1 | 1 |
| 3 | 2 | 3 |
ROW_NUMBER()
为每个 Partition 的每一行返回一个从1开始连续递增的整数。与 RANK() 和 DENSE_RANK() 不同的是,ROW_NUMBER() 返回的值不会重复也不会出现空缺,是连续递增的。
语法:
ROW_NUMBER() OVER(partition_by_clause order_by_clause)
举例:
select x, y, row_number() over(partition by x order by y) as rank from int_t;
| x | y | rank |
|---|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 3 | 1 | 2 |
| 3 | 2 | 3 |
SUM()
语法:
SUM([DISTINCT | ALL] expression) [OVER (analytic_clause)]
举例:
按照 property 进行分组,在组内计算当前行以及前后各一行的x列的和。
select x, property,
sum(x) over
(
partition by property
order by x
rows between 1 preceding and 1 following
) as 'moving total'
from int_t where property in ('odd','even');
| x | property | moving total |
|----|----------|--------------|
| 2 | even | 6 |
| 4 | even | 12 |
| 6 | even | 18 |
| 8 | even | 24 |
| 10 | even | 18 |
| 1 | odd | 4 |
| 3 | odd | 9 |
| 5 | odd | 15 |
| 7 | odd | 21 |
| 9 | odd | 16 |
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论