Apache Hbase
概述
HBase 是一个基于Hadoop的分布式
,可扩展
,巨大数据仓库
.当用户需要对海量数据进行实时(时效性)随机
(记录级别数据)读/写,用户可以使用Hbase.Hbase设计目标是能够持有一张巨大的表,该表
的规模能达到数十亿行
X 数百万列
,并且可以运行在商用的硬件集群之上. Hbase是一个开源的,分布式,版本化的非关系化的数据库-NoSQL,改设计仿照了Google的BigTable设计.
HDFS和HBase区别
Hbase是构建在HDFS之上的一个数据库服务,能够使得用户通过HBase数据库服务间接的操作HDFS,能够使得用户对HDFS上的数据实现CRUD操作(细粒度操作)。
Hbase特性-官方
- 线性和模块化扩展。
- 严格一致 reads 和 writes.
- 表的自动和可配置分片(自动分区)
- RegionServers之间的自动故障转移支持。
- 方便的基类,用于使用Apache HBase表支持Hadoop MapReduce作业。
- 易于使用的Java API,用于客户端访问。
- Block cache 和 Bloom Filters 以进行实时查询。
列存储
常见的NoSQL数据库常见分类:Key-Value
- Redis|SSDB Document
- MongoDB|Elasticsearch|Solr 列存储
- HBase 图像关系
- Neo4j 等.和关系数据库不同,NoSQL不同种类产品之间不可相互替换.
行
存储特点-RDBMS
ID | name | password | age | sex | address |
---|---|---|---|---|---|
1 | zhangsan | 123456 | 18 | true | 北上地 |
2 | lisi | 123456 | 20 | null | 北京朝阳 |
3 | wangwu | 123456 | null | null | null |
4 | zhaoliu | 123456 | null | false | null |
思考:
select id,name,passowrd from t_user where name='zhangsan' and password='123456'
- 按照name和 password索引快速定位当前记录
- 数据库底层加载id/name/password/age/sex/address
- 进行投影过滤出id/name/password
从上面过程不难看出age/sex/address的读取过程是多余的,这一部分IO的读取对于系统而言浪费.- IO利用率低;其次关系型数据库由于不支持稀疏存储(null值不存储),导致null值也会占用磁盘空间,给系统带来磁盘空间的浪费-磁盘利用率低.
- 解决之道(列共现性问题)
t_user_base
id | name | password |
---|---|---|
1 | zhangsan | 123456 |
2 | lisi | 123456 |
| 3 | wangwu | 123456 |
| 4 | zhaoliu | 123456 |
t_user_detail
ID | age | sex | address |
---|---|---|---|
1 | 18 | true | 北京上地 |
2 | 20 | null | 北京朝阳 |
4 | null | false | null |
- 列存储(hbase)
RowKey
:等价关系型数据库的主键ID
列簇
:将IO操作特性相似的列归为一个簇,Hbase底层会以列簇为单位索引数据.
列
:列簇/列名/列值/时间戳构成
时间戳
:用于记录Hbase中数据的版本,一般系统会自动指定为插入数据时间
Hbase安装
HDFS基本环境(存储)
1,安装JDK,配置环境变量JAVA_HOME
[root@CentOS ~]# rpm -ivh jdk-8u171-linux-x64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:jdk1.8-2000:1.8.0_171-fcs ################################# [100%]
Unpacking JAR files...
tools.jar...
plugin.jar...
javaws.jar...
deploy.jar...
rt.jar...
jsse.jar...
charsets.jar...
localedata.jar...
[root@CentOS ~]# vi .bashrc
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
[root@CentOS ~]# source .bashrc
[root@CentOS ~]# jps
1933 Jps
2,关闭防火墙
[root@CentOS ~]# systemctl stop firewalld # 关闭 服务
[root@CentOS ~]# systemctl disable firewalld # 关闭开机自启动
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@CentOS ~]# firewall-cmd --state
not running
3,配置主机名和IP映射关系
[root@CentOS ~]# cat /etc/hostname
CentOS
[root@CentOS ~]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.186.150 CentOS
4,配置SSH免密码登录
[root@CentOS ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:6yYiypvclJAZLU2WHvzakxv6uNpsqpwk8kzsjLv3yJA root@CentOS
The key's randomart image is:
+---[RSA 2048]----+
| .o. |
| =+ |
| o.oo |
| =. . |
| + o . S |
| o...= . |
|E.oo. + . |
|BXX+o.... |
|B#%O+o o. |
+----[SHA256]-----+
[root@CentOS ~]# ssh-copy-id CentOS
[root@CentOS ~]# ssh CentOS
Last failed login: Mon Jan 6 14:30:49 CST 2020 from centos on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Mon Jan 6 14:20:27 2020 from 192.168.186.1
5,上传Hadoop安装包,并解压到/usr目录
[root@CentOS ~]# tar -zxf hadoop-2.9.2.tar.gz -C /usr/
6,配置HADOOP_HOME环境变量
[root@CentOS ~]# vi .bashrc
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
[root@CentOS ~]# source .bashrc
[root@CentOS ~]# hadoop classpath #打印Hadoop的类路径
/usr/hadoop-2.9.2/etc/hadoop:/usr/hadoop-2.9.2/share/hadoop/common/lib/*:/usr/hadoop-2.9.2/share/hadoop/common/*:/usr/hadoop-2.9.2/share/hadoop/hdfs:/usr/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/usr/hadoop-2.9.2/share/hadoop/hdfs/*:/usr/hadoop-2.9.2/share/hadoop/yarn:/usr/hadoop-2.9.2/share/hadoop/yarn/lib/*:/usr/hadoop-2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar
7,修改core-site.xml
[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/core-site.xml
<!--nn访问入口-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://CentOS:9000</value>
</property>
<!--hdfs工作基础目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop-2.9.2/hadoop-${user.name}</value>
</property>
8,修改hdfs-site.xml
[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
<!--block副本因子-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--配置Sencondary namenode所在物理主机-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>CentOS:50090</value>
</property>
<!--设置datanode最大文件操作数-->
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<!--设置datanode并行处理能力-->
<property>
<name>dfs.datanode.handler.count</name>
<value>6</value>
</property>
9,修改slaves
[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/slaves
CentOS
10,格式化NameNode,生成fsimage
[root@CentOS ~]# hdfs namenode -format
[root@CentOS ~]# yum install -y tree
[root@CentOS ~]# tree /usr/hadoop-2.9.2/hadoop-root/
/usr/hadoop-2.9.2/hadoop-root/
└── dfs
└── name
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION
3 directories, 4 files
11,启动HDFS服务
[root@CentOS ~]# start-dfs.sh
Zookeeper安装(协调)
1,上传zookeeper的安装包,并解压在/usr目录下
[root@CentOS ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/
2,配置Zookepeer的zoo.cfg
[root@CentOS ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/
[root@CentOS ~]# cd /usr/zookeeper-3.4.12/
[root@CentOS zookeeper-3.4.12]# cp conf/zoo_sample.cfg conf/zoo.cfg
[root@CentOS zookeeper-3.4.12]# vi conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/zkdata
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
3,创建zookeeper的数据目录
[root@CentOS ~]# mkdir /root/zkdata
4,启动zookeeper服务
[root@CentOS ~]# cd /usr/zookeeper-3.4.12/
[root@CentOS zookeeper-3.4.12]# ./bin/zkServer.sh start zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@CentOS zookeeper-3.4.12]# ./bin/zkServer.sh status zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: standalone
Hbase配置与安装(数据库服务)
1,上传Hbase安装包,并解压到/usr目录下
[root@CentOS ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
2,配置Hbase环境变量HBASE_HOME
[root@CentOS ~]# vi .bashrc
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME
[root@CentOS ~]# source .bashrc
[root@CentOS ~]# hbase classpath # 测试Hbase是否识别Hadoop
/usr/hbase-1.2.4/conf:/usr/java/latest/lib/tools.jar:/usr/hbase-1.2.4:/usr/hbase-1.2.4/lib/activation-1.1.jar:/usr/hbase-1.2.4/lib/aopalliance-1.0.jar:/usr/hbase-1.2.4/lib/apacheds-i18n-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/api-asn1-api-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/api-util-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/asm-3.1.jar:/usr/hbase-1.2.4/lib/avro-
...
1.7.4.jar:/usr/hbase-1.2.4/lib/commons-beanutils-1.7.0.jar:/usr/hbase-1.2.4/lib/commons-
2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar
3,配置hbase-site.xml
[root@CentOS ~]# cd /usr/hbase-1.2.4/
[root@CentOS hbase-1.2.4]# vi conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://CentOS:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>CentOS</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
4,修改hbase-env.sh,将HBASE_MANAGES_ZK
修改为false
[root@CentOS ~]# cd /usr/hbase-1.2.4/
[root@CentOS hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh
# export HBASE_MANAGES_ZK=true
[root@CentOS hbase-1.2.4]# vi conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
[root@CentOS hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
export HBASE_MANAGES_ZK=false
告知Hbase,使用外部Zookeeper
5,启动Hbase
[root@CentOS hbase-1.2.4]# ./bin/start-hbase.sh
starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-CentOS.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-1-regionserver-CentOS.out
[root@CentOS hbase-1.2.4]# jps
3090 NameNode
5027 HMaster
3188 DataNode
5158 HRegionServer
3354 SecondaryNameNode
5274 Jps
3949 QuorumPeerMain
6,验证Hbase安装是否成功
WebUI验证 http://192.168.186.150:16010/
- Hbase shell验证(靠谱)
[root@CentOS hbase-1.2.4]# ./bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
hbase(main):002:0> version
1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
hbase(main):003:0>
- 链接HDFS查看
WebUI验证 http://192.168.186.150:50070/