如果有一些数据存储在hbase 当中 我们想通过sql 分析其中的数据 那么 与hive 集成就是一个不错的方法,本质上来说就是hive 充当了h’ba’se的客户端。
1. 首先我们需要将hbase 的客户端jar 拷入hive lib 目录下
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-common-2.1.1.jar /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-server-2.1.1.jar /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-client-2.1.1.jar /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-protocol-2.1.1.jar /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-it-2.1.1.jar /apps/soft/apache-hive-2.2.0-bin/lib/
2. 修改hive/conf下的hive-site.xml配置文件,在最后添加如下属性
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop</value>
</property>
3. 启动hive,HBase与hive集成有两种方式,第一种是创建表管理表hbase_table_1,指定数据存储在hbase表中
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
结果如下:
hive (default)> CREATE TABLE hbase_table_1(key int, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "xyz");
OK
Time taken: 6.162 seconds
hive (default)>
>
在hbase中查看是否创建xyz表
hbase(main):001:0> list
TABLE
info
member
split01
test
user
xyz
6 row(s)
Took 0.9500 seconds
=> ["info", "member", "split01", "test", "user", "xyz"]
往hive表hbase_table_1表中插入数据
hive (default)> insert overwrite table hbase_table_1 select id,name from user_info_t1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190120094858_2049cc20-0ffb-4f2c-9bf2-a5bf5ed22b7c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1547947280690_0002, Tracking URL = http://hadoop:8088/proxy/application_1547947280690_0002/
Kill Command = /apps/soft/hadoop-2.7.5/bin/hadoop job -kill job_1547947280690_0002
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-01-20 09:49:29,477 Stage-3 map = 0%, reduce = 0%
2019-01-20 09:49:39,694 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 3.84 sec
MapReduce Total cumulative CPU time: 3 seconds 840 msec
Ended Job = job_1547947280690_0002
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 3.84 sec HDFS Read: 10864 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 840 msec
OK
id name
Time taken: 43.845 seconds
我们检查一下数据是否存到了hbase 中
hbase(main):001:0> scan 'xyz'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1547948978529, value=xiaoming
2 column=cf1:val, timestamp=1547948978529, value=lilei
3 column=cf1:val, timestamp=1547948978529, value=lihua
3 row(s)
Took 0.7973 seconds
因为这个时hive 的管理表所以当再hive 中删除表的时候响应的hbase 中的表也会删除。
4. 第二中方式是创建外部表hbase_test,hbase中已经有test表
再hbase 中我们有一张user 表如下
hbase(main):027:0> scan 'user'
ROW COLUMN+CELL
10001 column=info:age, timestamp=1547949708793, value=29
10001 column=info:name, timestamp=1546782143237, value=zhangsan
10003 column=info:age, timestamp=1546781517096, value=28
10003 column=info:name, timestamp=1546781533284, value=wangwu
2 row(s)
Took 0.0423 seconds
hbase(main):028:0>
于是我们可以创建一个hive 的外部表
create external table hbase_user(id int, name string, age int)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping" = ":key,info:name,info:age")
tblproperties ("hbase.table.name" = "user")
我们来看下结果
hive (default)> select * from hbase_user;
OK
hbase_user.id hbase_user.name hbase_user.age
10001 zhangsan 29
10003 wangwu 28
Time taken: 4.348 seconds, Fetched: 2 row(s)
hive (default)>