HBase与hive集成

最新推荐文章于 2025-07-09 20:01:01 发布

原创最新推荐文章于 2025-07-09 20:01:01 发布 · 2.3k 阅读

3 ·

CC 4.0 BY-SA版权

hive 同时被 2 个专栏收录

17 篇文章

订阅专栏

hbase

9 篇文章

订阅专栏

如果有一些数据存储在hbase 当中我们想通过sql 分析其中的数据那么与hive 集成就是一个不错的方法，本质上来说就是hive 充当了h’ba’se的客户端。

1. 首先我们需要将hbase 的客户端jar 拷入hive lib 目录下

[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-common-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-server-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-client-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-protocol-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/
[root@hadoop lib]# cp /apps/soft/hbase-2.1.1/lib/hbase-it-2.1.1.jar  /apps/soft/apache-hive-2.2.0-bin/lib/

2. 修改hive/conf下的hive-site.xml配置文件，在最后添加如下属性

<property>
        <name>hbase.zookeeper.quorum</name>
        <value>hadoop</value>
  </property>

3. 启动hive，HBase与hive集成有两种方式，第一种是创建表管理表hbase_table_1，指定数据存储在hbase表中

CREATE TABLE hbase_table_1(key int, value string)   
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")  
TBLPROPERTIES ("hbase.table.name" = "xyz");

结果如下：

hive (default)> CREATE TABLE hbase_table_1(key int, value string)
              > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
              > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
              > TBLPROPERTIES ("hbase.table.name" = "xyz");
OK
Time taken: 6.162 seconds
hive (default)>
              >

在hbase中查看是否创建xyz表

hbase(main):001:0> list
TABLE
info
member
split01
test
user
xyz
6 row(s)
Took 0.9500 seconds
=> ["info", "member", "split01", "test", "user", "xyz"]

往hive表hbase_table_1表中插入数据

hive (default)> insert overwrite table hbase_table_1 select id,name from user_info_t1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190120094858_2049cc20-0ffb-4f2c-9bf2-a5bf5ed22b7c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1547947280690_0002, Tracking URL = http://hadoop:8088/proxy/application_1547947280690_0002/
Kill Command = /apps/soft/hadoop-2.7.5/bin/hadoop job  -kill job_1547947280690_0002
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-01-20 09:49:29,477 Stage-3 map = 0%,  reduce = 0%
2019-01-20 09:49:39,694 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.84 sec
MapReduce Total cumulative CPU time: 3 seconds 840 msec
Ended Job = job_1547947280690_0002
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1   Cumulative CPU: 3.84 sec   HDFS Read: 10864 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 840 msec
OK
id      name
Time taken: 43.845 seconds

我们检查一下数据是否存到了hbase 中

hbase(main):001:0> scan 'xyz'
ROW                                              COLUMN+CELL
 1                                               column=cf1:val, timestamp=1547948978529, value=xiaoming
 2                                               column=cf1:val, timestamp=1547948978529, value=lilei
 3                                               column=cf1:val, timestamp=1547948978529, value=lihua
3 row(s)
Took 0.7973 seconds

因为这个时hive 的管理表所以当再hive 中删除表的时候响应的hbase 中的表也会删除。

4. 第二中方式是创建外部表hbase_test，hbase中已经有test表

再hbase 中我们有一张user 表如下

hbase(main):027:0> scan 'user'
ROW                                              COLUMN+CELL
 10001                                           column=info:age, timestamp=1547949708793, value=29
 10001                                           column=info:name, timestamp=1546782143237, value=zhangsan
 10003                                           column=info:age, timestamp=1546781517096, value=28
 10003                                           column=info:name, timestamp=1546781533284, value=wangwu
2 row(s)
Took 0.0423 seconds
hbase(main):028:0>

于是我们可以创建一个hive 的外部表

create external table hbase_user(id int, name string, age int)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 with serdeproperties ("hbase.columns.mapping" = ":key,info:name,info:age")
tblproperties ("hbase.table.name" = "user")

我们来看下结果

hive (default)> select * from hbase_user;
OK
hbase_user.id   hbase_user.name hbase_user.age
10001   zhangsan        29
10003   wangwu  28
Time taken: 4.348 seconds, Fetched: 2 row(s)
hive (default)>