1.基本语法
Function (arg1,..., argn) OVER ([PARTITION BY <...>] [ORDER BY <....>]
[<window_expression>])
Function (arg1,..., argn) 可以是下面的四类函数:
(1)Aggregate Functions: 聚合函数,比如:sum(...)、 max(...)、min(...)、avg(...)等
(2) Sort Functions: 数据排序函数, 比如 :rank(...)、row_number(...)等
(3)Analytics Functions: 统计和比较函数, 比如: lead(...)、lag(...)、 first_value(...)等
2.数据准备
(1)样例数据
[职工姓名|部门编号|职工ID|工资|岗位类型|入职时间]
Michael|1000|100|5000|full|2014-01-29
Will|1000|101|4000|full|2013-10-02
Wendy|1000|101|4000|part|2014-10-02
Steven|1000|102|6400|part|2012-11-03
Lucy|1000|103|5500|full|2010-01-03
Lily|1001|104|5000|part|2014-11-29
Jess|1001|105|6000|part|2014-12-02
Mike|1001|106|6400|part|2013-11-03
Wei|1002|107|7000|part|2010-04-03
Yun|1002|108|5500|full|2014-01-29
Richard|1002|109|8000|full|2013-09-01
(2)建表语句:
CREATE TABLE IF NOT EXISTS employee (
name string,
dept_num int,
employee_id int,
salary int,
type string,
start_date date
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED as TEXTFILE;
(3)加载数据
load data local inpath '/opt/datas/data/employee_contract.txt' into table employee;
3.窗口聚合函数
(1)查询姓名、部门编号、工资以及部门人数
select
name,
dept_num as deptno ,
salary,
count(*) over (partition by dept_num) as cnt
from employee ;
结果输出:
name deptno salary cnt
Lucy 1000 5500 5
Steven 1000 6400 5
Wendy 1000 4000 5
Will 1000 4000 5
Michael 1000 5000 5
Mike 1001 6400 3
Jess 1001 6000 3
Lily 1001 5000 3
Richard 1002 8000 3
Yun 1002 5500 3
Wei 1002 7000 3
(2)查询姓名、部门编号、工资以及每个部门的总工资,部门总工资按照降序输出
select
name ,
dept_num as deptno,
salary,
sum(salary) over (partition by dept_num order by dept_num) as sum_dept_salary
from employee
order by sum_dept_salary desc;