Hadoop2.9.2+Spark2.4.8安装

该文章详细描述了在四台CentOS机器上搭建Hadoop和Spark集群的步骤,包括环境准备(如互信设置、关闭防火墙、安装JDK和Scala)、Hadoop的安装与配置(包括HDFS和YARN的配置)、Spark的安装与配置,以及验证安装成功的操作。整个过程涵盖了从下载软件到设置环境变量,再到启动服务的完整流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

版本如下:

Hadoop2.9.2  
spark2.4.8 
Scala2.11.12
Linux:CentOS7.4

四台机器hostname设置如下:


ambari.master.hadoop
ambari.node1.hadoop
ambari.node2.hadoop
ambari.node3.hadoop
spark作为主节点,其它三个是计算节点。

环境准备

1.设置机器之间互相互信:


创建路径:mkdir -p ~/.ssh,每台机器都需要

生成公私秘钥:ssh-keygen -t rsa  ,全按Enter键,每台机器都需要

在ambari.master.hadoop节点执行:
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys ambari.node1.hadoop:~/.ssh/

在ambari.node1.hadoop节点执行:
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys ambari.node2.hadoop:~/.ssh/

在ambari.node2.hadoop节点执行:
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys ambari.node3.hadoop:~/.ssh/

在ambari.node3.hadoop节点执行:
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys ambari.master.hadoop:~/.ssh/
scp ~/.ssh/authorized_keys ambari.node1.hadoop:~/.ssh/
scp ~/.ssh/authorized_keys ambari.node2.hadoop:~/.ssh/

2.关闭防火墙:


systemctl stop firewalld
systemctl disable firewalld

3.jdk安装,每个节点都需要

export JAVA_HOME=/usr/local/jdk1.8.0_221       
export JRE_HOME=/usr/local/jdk1.8.0_221/jre     
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

4.scala安装,spark安装需要依赖Scala,每个节点都需要安装。

export SCALA_HOME=/usr/local/scala-2.11.12
export PATH=$PATH:$SCALA_HOME/bin


一、Hadoop安装

下载Hadoop:

https://hadoop.apache.org/releases.html

解压并授权 

tar -xzvf hadoop-2.9.2.tar.gz
mv hadoop-2.9.2 /usr/local/server/hadoop-2.9.2
chmod -R 755 /usr/local/server/hadoop-2.9.2

设置环境变量


vim /etc/profile.d/hadoop.sh

export HADOOP_HOME=/usr/local/server/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

修改配置文件,均位于/usr/local/server/hadoop-2.9.2/etc/hadoop下

vim hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_221

vim core-site.xml  hdfs核心配置 

<configuration>
    <!-- 指定HDFS老大(namenode)的通信地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://ambari.master.hadoop:9000</value>
    </property>

    <!-- 指定hadoop运行时产生文件的存储路径 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/server/hadoop/tmp</value>
    </property>
</configuration>

vim hdfs-site.xml    通信地址 

<configuration>
    <!-- 设置namenode的http通讯地址 -->
    <property>
       <name>dfs.namenode.http-address</name>
       <value>ambari.master.hadoop:50070</value>
    </property>
    <!-- 设置secondarynamenode的http通讯地址 -->
    <property>
       <name>dfs.namenode.secondary.http-address</name>
       <value>ambari.node1.hadoop:50090</value>
    </property>
     <!-- 设置namenode存放的路径 -->
    <property>
       <name>dfs.namenode.name.dir</name>
       <value>/usr/local/server/hadoop/name</value>
    </property>
    <!-- 设置hdfs副本数量 -->
    <property>
       <name>dfs.replication</name>
       <value>2</value>
    </property>
    <!-- 设置datanode存放的路径 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/local/server/hadoop/data</value>
     </property>
</configuration>

vim mapred-site.xml

<configuration>
 
    <!-- 通知框架MR使用YARN -->
 
    <property>
 
       <name>mapreduce.framework.name</name>
 
       <value>yarn</value>
 
    </property>

</configuration>

vim yarn-site.xml

<configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>ambari.master.hadoop:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>ambari.master.hadoop:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>ambari.master.hadoop:8035</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>ambari.master.hadoop:8033</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>ambari.master.hadoop:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.pmem-check-enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
        </property>
       <!-- 以下变量根据自己的环境修改value值,yarn不会根据环境动态识别 -->
        <property>
            <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>3036</value>
        </property>
        <property>
            <description>The minimum allocation for every container request at the RM,
                         in MBs. Memory requests lower than this won't take effect,
                         and the specified value will get allocated at minimum.</description>
            <name>yarn.scheduler.minimum-allocation-mb</name>
            <value>128</value>
        </property>
        <property>
            <description>The maximum allocation for every container request at the RM,
                         in MBs. Memory requests higher than this won't take effect,
                         and will get capped to this value.</description>
            <name>yarn.scheduler.maximum-allocation-mb</name>
            <value>2560</value>
        </property>
    </configuration>

vim masters 

ambari.master.hadoop

vim slaves

ambari.node1.hadoop
ambari.node2.hadoop
ambari.node3.hadoop

将Hadoop目录拷贝到各个节点
将/etc/profile.d/hadoop.sh
. /etc/profile.d/hadoop.sh
cd /usr/local/server/
ln -s /usr/local/server/hadoop-2.9.2 hadoop
mkdir -pv /usr/local/server/hadoop/{data,name,tmp}

主节点格式化名称节点:

hdfs namenode -format

所有节点启动hdfs:
start-dfs.sh

所有节点启动yarn:
start-yarn.sh

查看机器状态:
hdfs dfsadmin -report

至此已安装完成,访问yarn页面看看
http://ambari.master.hadoop:8088/cluster


另附上集群关闭命令和流程:

stop-yarn.sh
stop-dfs.sh

二、spark安装

spark下载
http://spark.apache.org/downloads.html

解压
tar -xzvf spark-2.4.8-bin-without-hadoop.tgz 
mv spark-2.4.8-bin-without-hadoop /etc/local/server/

增加环境变量:
vim /etc/profile.d/hadoop.sh

export SPARK_HOME=/usr/local/server/spark
export PATH=$PATH:$SPARK_HOME/bin
export PATH=$PATH:$SPARK_HOME/sbin

创建hdfs目录

hadoop fs -mkdir -p /tmp/spark/lib_jars/
hadoop fs -mkdir -p /eventLogs
hadoop fs -mkdir -p /user/hive/warehouse

修改配置文件
vim spark-env.sh

export LD_LIBRARY_PATH=/usr/local/server/hadoop-2.9.2/lib/native
export JAVA_HOME=/usr/local/jdk1.8.0_231
export HADOOP_CONF_DIR=/usr/local/server/hadoop/etc/hadoop
export YARN_CONF_DIR=/usr/local/server/hadoop/etc/hadoop
export SPARK_CONF_DIR=/usr/local/server/spark/conf
export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)

vim spark-defaults.conf

spark.yarn.jars hdfs://ambari.master.hadoop:9000/tmp/spark/lib_jars/*.jar
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://ambari.master.hadoop:9000/eventLogs
spark.eventLog.compress          true
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.master                    yarn 

 为加快spark程序运行,把spark安装目录下的jar包全部放到hdfs上面

hadoop fs -put $SPARK_HOME/jars/* /tmp/spark/lib_jars

将spark目录复制到各个节点


将/etc/profile.d/hadoop.sh分发到各个节点

配置软链接
cd /usr/local/server/

ln -s /usr/local/server/spark-2.4.8-bin-without-hadoop spark

验证是否安装成功,启动spark:
spark-shell


上面安装的是spark on yarn模式,如果要部署spark standlone模式,则安装以下安装

spark standlone模式:


vim spark-env.sh

export SPARK_MASTER_IP=master

vim spark-defaults.conf

#spark.master yarn

vim slaves

ambari.node1.hadoop
ambari.node2.hadoop
ambari.node3.hadoop

将修改的配置文件分发到各个节点

启动spark

./start-all.sh

验证


spark-shell --master spark://ambari.master.hadoop:7077


 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

白杨Shayne

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值