使用前提
oozie自带了各种案例,我们可以使用oozie自带的各种案例来作为模板,所以我们这里先把官方提供的各种案例给解压出来
cd /export/servers/oozie-4.1.0-cdh5.14.0
tar -zxf oozie-examples.tar.gz
1 使用oozie 调度shell 脚本
1.1在oozie的安装目录下面创建工作目录
cd /export/servers/oozie-4.1.0-cdh5.14.0
mkdir oozie_works
1.2把shell的任务模板拷贝到我们oozie的工作目录当中去
cd /export/servers/oozie-4.1.0-cdh5.14.0
cp -r examples/apps/shell/ oozie_works/
1.3随意准备一个shell脚本
cd /export/servers/oozie-4.1.0-cdh5.14.0
vim oozie_works/shell/hello.sh
#!/bin/bash
echo "hello world" >> /export/servers/hello_oozie.txt
第二部分
修改job.properties
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/shell
vim job.properties
nameNode=hdfs://**node01**:8020
jobTracker=**node01:8032**
queueName=default
examplesRoot=**oozie_works**
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/shell
EXEC=hello.sh
修改workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<!-- <argument>my_output=Hello Oozie</argument> -->
<file>/user/root/oozie_works/shell/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<decision name="check-output">
<switch>
<case to="end">
${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
</case>
<default to="fail-output"/>
</switch>
</decision>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shell-node')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>
*
第三部分
上传的hdfs目录为/user/root
cd /export/servers/oozie-4.1.0-cdh5.14.0
hdfs dfs -put oozie_works/shell /user/root
通过oozie的命令来执行调度任务
cd /export/servers/oozie-4.1.0-cdh5.14.0
bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/shell/job.properties -run
2使用oozie调度hive
1 拷贝任务模板
2 修改job(把任务模板上传到hdfs上去)
3修改workflow
4上传
5提交任务
第一步:拷贝hive的案例模板
cd /export/servers/oozie-4.1.0-cdh5.14.0
cp -ra examples/apps/hive2/ oozie_works/
第二步:编辑hive模板
这里使用的是hiveserver2来进行提交任务,需要注意我们要将hiveserver2的服务给启动起来
hive --service hiveserver2 &
hive --service metastore &
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/hive2
修改job.properties
vim job.properties
nameNode=hdfs://node01:8020
jobTracker=node01:8032
queueName=default
jdbcURL=jdbc:hive2://node03:10000/default
examplesRoot=oozie_works
oozie.use.system.libpath=true
# 配置我们文件上传到hdfs的保存路径 实际上就是在hdfs 的/user/root/oozie_works/hive2这个路径下
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/hive2
修改workflow.xml
vim workflow.xml
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2-wf">
<start to="hive2-node"/>
<action name="hive2-node">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive2"/>
<mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script.q</script>
<param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param>
<param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive2</param>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
编辑hivesql文件
vim script.q
DROP TABLE IF EXISTS test;
CREATE EXTERNAL TABLE default.test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
insert into test values(10);
insert into test values(20);
insert into test values(30);
上传工作文件到hdfs
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works
hdfs dfs -put hive2/ /user/root/oozie_works/
执行oozie的调度
cd /export/servers/oozie-4.1.0-cdh5.14.0
bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/hive2/job.properties -run
3使用oozie调度MR任务
准备以下数据上传到HDFS的/oozie/input路径下去
hdfs dfs -mkdir -p /oozie/input
vim wordcount.txt
hello world hadoop
spark hive hadoop
将数据上传到hdfs对应目录
hdfs dfs -put wordcount.txt /oozie/input
执行官方测试案例
hadoop jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar wordcount /oozie/input/ /oozie/output
拷贝MR的任务模板
cd /export/servers/oozie-4.1.0-cdh5.14.0
cp -r examples/apps/map-reduce/ oozie_works/
删掉MR任务模板lib目录下自带的jar包
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib
rm -rf oozie-examples-4.1.0-cdh5.14.0.jar
拷贝的jar包到对应目录
/export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib这个目录下,所以我们把我们需要调度的jar包也放到这个路径下即可
cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/
修改配置文件
修改job.properties
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
vim job.properties
nameNode=hdfs://node01:8020
jobTracker=node01:8032
queueName=default
examplesRoot=oozie_works
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
outputDir=/oozie/output
inputdir=/oozie/input
修改workflow.xml
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
vim workflow.xml
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<!--
<property>
<name>mapred.mapper.class</name>
<value>org.apache.oozie.example.SampleMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.oozie.example.SampleReducer</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
</property>
-->
<!-- 开启使用新的API来进行配置 -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<!-- 指定MR的输出key的类型 -->
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<!-- 指定MR的输出的value的类型-->
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<!-- 指定输入路径 -->
<property>
<name>mapred.input.dir</name>
<value>${nameNode}/${inputdir}</value>
</property>
<!-- 指定输出路径 -->
<property>
<name>mapred.output.dir</name>
<value>${nameNode}/${outputDir}</value>
</property>
<!-- 指定执行的map类 -->
<property>
<name>mapreduce.job.map.class</name>
<value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
</property>
<!-- 指定执行的reduce类 -->
<property>
<name>mapreduce.job.reduce.class</name>
<value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
</property>
<!-- 配置map task的个数 -->
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
上传调度任务到hdfs对应目录
cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works
hdfs dfs -put map-reduce/ /user/root/oozie_works/
执行调度任务
cd /export/servers/oozie-4.1.0-cdh5.14.0
bin/oozie job -oozie http://bd001:11000/oozie -config oozie_works/map-reduce/job.properties -run
oozie工作流调度的设置总结
1 解压
cd /export/servers/oozie-4.1.0-cdh5.14.0
tar -zxf oozie-examples.tar.gz
2 拷贝该任务的模板
3 修改job.properties 文件
4 修改workfolw.xml的文件
5 上传已配置好的文件上传的hdfs上
6 执行任务