一、环境
CDH 5.15.2
JDK 1.8
编译环境:macOs,和linux差别不大
MAVEN:阿里云镜像
基于CDH集群已有Hbase、Kafka和单节点Elasticsearch搭建
注意点:Atlas 只支持HIVE1.2.1以上版本血缘功能,可以单方面升级HIVE版或者升级CDH版本
二、编译安装
1.下载地址:http://atlas.apache.org/#/Downloads
2.解压源码包
tar -xvfz apache-atlas-${project.version}-sources.tar.gz
cd apache-atlas-sources-${project.version}
export MAVEN_OPTS="-Xms2g -Xmx2g"
mvn clean -DskipTests package -Pdist,embedded-hbase-solr
3.编译错误解决
1.编译到UI部分时候,遇到“npm install”
Failed to execute goal on project atlas-graphdb-janus: Could not resolve dependencies for project org.apache.atlas:atlas-graphdb-janus:jar:1.1.0: Could not find artifact com.sleepycat:je:jar:7.4.5 in nexus (http://maven.aliyun.com/nexus/content/groups/public/)
解决:下载安装nodejs 地址:https://nodejs.org/en/
测试:npm -v;node -v
2: 网络问题编译不过如下图
解决:这里需要根据情况切换maven的仓库:阿里云和apache_central
在前面基础部分,可以配置阿里云(setting.xml)下载依赖吧飞速;当遇到阿里云not find时候可以再切换到默认仓库,即注释掉你配置的阿里云即可。
3.npm报错
[ERROR] npm ERR! cb() never called!
[ERROR]
[ERROR] npm ERR! This is an error with npm itself. Please report this error at:
[ERROR] npm ERR! <https://github.com/npm/npm/issues>
[ERROR]
[ERROR] npm ERR! A complete log of this run can be found in:
[ERROR] npm ERR! /Users/xxx/.npm/_logs/2020-06-26T12_58_07_791Z-debug.log
解决:
以管理员模式打开cmd清除你的npm缓存 (mac电脑在npm前加sudo):
npm cache clean -f
清除完缓存后,安装最新版本的Node helper:
npm install -g n
然后安装npm包管理助手
npm install -g n --force
用n助手安装最新的稳定版的node
n stable
最后:编译成功
三、初步使用(包括Hive hook配置)
1.备份并修改配置文件
配置文件地址
apache-atlas-sources-${project.version}/distro/target/conf/atlas-application.properties
备份并修改配置文件
cd apache-atlas-sources-${project.version}/distro/target/conf/
cp atlas-application.properties atlas-application.properties.bak
vi atlas-application.propertie
待修改配置项
#修改为hbase zk地址
atlas.graph.storage.hostname=zk1,zk2,zk3
#solr修改为es,并将Solr的相关配置全部注释
atlas.graph.index.search.backend=elasticsearch
#添加
atlas.graph.index.search.hostname=localhost
atlas.graph.index.search.elasticsearch.client-only=true
#关闭内置kafka
atlas.notification.embedded=false
#修改kafka配置,zk地址,broke地址
atlas.kafka.zookeeper.connect=zk1:2181
atlas.kafka.bootstrap.servers=xxx:9092
#修改rest地址为ip
atlas.rest.address=http://ip:21000
#hive hook
# whether to run the hook synchronously. false recommended to avoid delays in Hive query completion. Default: false
atlas.hook.hive.synchronous=false
# number of retries for notification failure. Default: 3
atlas.hook.hive.numRetries=3
# queue size for the threadpool. Default: 10000
atlas.hook.hive.queueSize=10000
# clusterName to use in qualifiedName of entities. Default: primary
atlas.cluster.name=primary
2.配置Hive Hook
通过Clouder Manager添加:集群——》Hive——》配置——》搜索hive-site.xml
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
3.复制hook/hive
将源码文件中/distro/target/apache-atlas-project.version−hive−hook/apache−atlas−hive−hook−{project.version}-hive-hook/apache-atlas-hive-hook-project.version−hive−hook/apache−atlas−hive−hook−{project.version}路径下的hook/和hook-bin/都复制到/distro/target/apache-atlas-project.version−server/apache−atlas−{project.version}-server/apache-atlas-project.version−server/apache−atlas−{project.version}路径下。
4.添加缺失的Jar包
需要在/distro/target/apache-atlas-project.version−server/apache−atlas−{project.version}-server/apache-atlas-project.version−server/apache−atlas−{project.version}/hook/hive/atlas-hive-plugin-impl/路径下添加如下jar包:
jackson-module-jaxb-annotations-2.9.9.jar,下载地址:https://mvnrepository.com/artifact/com.fasterxml.jackson.module/jackson-module-jaxb-annotations/2.9.9
jackson-jaxrs-base-2.9.9.jar,下载地址:https://mvnrepository.com/artifact/com.fasterxml.jackson.jaxrs/jackson-jaxrs-base/2.9.9
jackson-jaxrs-json-provider-2.9.9.jar ,下载地址:https://mvnrepository.com/artifact/com.fasterxml.jackson.jaxrs/jackson-jaxrs-json-provider/2.9.9
将/distro/target/apache-atlas-project.version−server/apache−atlas−{project.version}-server/apache-atlas-project.version−server/apache−atlas−{project.version}移到/opt/atlas
5.将atlas-application.properties添加到/opt/atlas/hook/hive/atlas-plugin-classloader-1.2.0.jar
原因:CDH环境每次启动Hive时,会复制一份conf到process目录下,导致配置缺失atlas-application.properties文件,使用报错。在给各个hive节点复制jar之前,把配置文件添加到jar文件中,这样就可以保证正常启动。
zip -u atlas-plugin-classloader-1.2.0.jar atlas-application.properties
6.集群每台机器添加依赖
cp /opt/atlas/hook/hive/atlas-hive-plugin-impl/* /opt/aux_path
cp /opt/atlas/hook/hive/*.jar /opt/aux_path
#scp复制到每台机器
scp -r /opt/aux_path/* xxx:/opt/aux_path/
7.添加环境变量HIVE_AUX_JARS_PATH
通过Clouder Manager添加配置项:集群——》Hive——》配置——》搜索HIVE_AUX_JARS_PATH
添加然后重启HIVE
四、启动Apache Atlas
1.启动
export MANAGE_LOCAL_HBASE=false
export MANAGE_LOCAL_SOLR=false
bin/atlas_start.py
Enter username for atlas :-
Enter password for atlas :-
默认的用户名密码为:admin/admin
2.登陆web
http://xxx:21000
3.导入HIVE表
完成上一步已经成功启动了atlas,但还没有数据,需要导入已有的HIVE表信息
# HIVE_HOME根据自己CDH环境对应修改
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
#拷贝配置,不然报错“org.apache.atlas.AtlasException: Failed to load application properties”
cp conf/atlas-application.properties /etc/hive/conf/
#导入所有的库和表
/opt/atlas/hook-bin/import-hive.sh
# 导入指定库和表
./import-hive.sh [-d <database regex> OR --database <database regex>] [-t <table regex> OR --table <table regex>]
# 导入多个库和表
./import-hive.sh [-f <filename>]
File Format:
database1:tbl1
database1:tbl2
database2:tbl1
最后查看/opt/atlas/logs/import-hive.log日志看导入是否正常