HBase与hive集成

如果有一些数据存储在hbase 当中 我们想通过sql 分析其中的数据 那么 与hive 集成就是一个不错的方法,本质上来说就是hive 充当了h’ba’se的客户端。

1. 首先我们需要将hbase 的客户端jar 拷入hive lib 目录下

[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-common-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-server-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-client-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-protocol-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-it-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/

2. 修改hive/conf下的hive-site.xml配置文件,在最后添加如下属性

<property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop</value>
  </property>

3. 启动hive,HBase与hive集成有两种方式,第一种是创建表管理表hbase_table_1,指定数据存储在hbase表中

CREATE TABLE hbase_table_1(key int, value string)   
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")  
TBLPROPERTIES ("hbase.table.name" = "xyz"); 

结果如下:

hive (default)> CREATE TABLE hbase_table_1(key int, value string)
              > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
              > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
              > TBLPROPERTIES ("hbase.table.name" = "xyz");
OK
Time taken: 6.162 seconds
hive (default)>
              >

在hbase中查看是否创建xyz表

hbase(main):001:0> list
TABLE
info
member
split01
test
user
xyz
6 row(s)
Took 0.9500 seconds
=> ["info", "member", "split01", "test", "user", "xyz"]

往hive表hbase_table_1表中插入数据

hive (default)> insert overwrite table hbase_table_1 select id,name from user_info_t1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190120094858_2049cc20-0ffb-4f2c-9bf2-a5bf5ed22b7c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1547947280690_0002, Tracking URL = http://hadoop:8088/proxy/application_1547947280690_0002/
Kill Command = /apps/soft/hadoop-2.7.5/bin/hadoop job  -kill job_1547947280690_0002
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-01-20 09:49:29,477 Stage-3 map = 0%,  reduce = 0%
2019-01-20 09:49:39,694 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.84 sec
MapReduce Total cumulative CPU time: 3 seconds 840 msec
Ended Job = job_1547947280690_0002
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1   Cumulative CPU: 3.84 sec   HDFS Read: 10864 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 840 msec
OK
id      name
Time taken: 43.845 seconds

我们检查一下数据是否存到了hbase 中

hbase(main):001:0> scan 'xyz'
ROW                                              COLUMN+CELL
 1                                               column=cf1:val, timestamp=1547948978529, value=xiaoming
 2                                               column=cf1:val, timestamp=1547948978529, value=lilei
 3                                               column=cf1:val, timestamp=1547948978529, value=lihua
3 row(s)
Took 0.7973 seconds

因为这个时hive 的管理表所以当再hive 中删除表的时候响应的hbase 中的表也会删除。

4. 第二中方式是创建外部表hbase_test,hbase中已经有test表

再hbase 中我们有一张user 表如下

hbase(main):027:0> scan 'user'
ROW                                              COLUMN+CELL
 10001                                           column=info:age, timestamp=1547949708793, value=29
 10001                                           column=info:name, timestamp=1546782143237, value=zhangsan
 10003                                           column=info:age, timestamp=1546781517096, value=28
 10003                                           column=info:name, timestamp=1546781533284, value=wangwu
2 row(s)
Took 0.0423 seconds
hbase(main):028:0>

于是我们可以创建一个hive 的外部表

create external table hbase_user(id int, name string, age int)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 with serdeproperties ("hbase.columns.mapping" = ":key,info:name,info:age")
tblproperties ("hbase.table.name" = "user")

我们来看下结果

hive (default)> select * from hbase_user;
OK
hbase_user.id   hbase_user.name hbase_user.age
10001   zhangsan        29
10003   wangwu  28
Time taken: 4.348 seconds, Fetched: 2 row(s)
hive (default)>

欢迎关注,更多福利

这里写图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值