0% found this document useful (0 votes)
78 views9 pages

Windows 10 Intalling Apache Spark

this text about installing apache sparrk in windows 10

Uploaded by

ZamzamilKhoiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views9 pages

Windows 10 Intalling Apache Spark

this text about installing apache sparrk in windows 10

Uploaded by

ZamzamilKhoiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Installing Spark on Windows 10.

Shantanu Sharma
Department of Computer Science, Ben-Gurion University, Israel.
[email protected]

1. Install Scala: Download Scala from the link: http://downloads.lightbend.com/scala/2.11.8/scala-


2.11.8.msi

a. Set environmental variables:


i. User variable:
Variable: SCALA_HOME;
Value: C:\Program Files (x86)\scala
ii. System variable:
Variable: PATH
Value: C:\Program Files (x86)\scala\bin
b. Check it on cmd, see below.

2. Install Java 8: Download Java 8 from the link:


http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
a. Set environmental variables:
i. User variable:
Variable: JAVA_HOME
Value: C:\Program Files\Java\jdk1.8.0_91
ii. System variable:
Variable: PATH
Value: C:\Program Files\Java\jdk1.8.0_91\bin
b. Check on cmd, see below:
3. Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and extract it into C
drive.
a. Set environmental variables:
i. User variable:
Variable: ECLIPSE_HOME
Value: C:\eclipse
ii. System variable:
Variable: PATH
Value: C:\eclipse \bin

4. Install Spark 1.6.1. Download it from the following link: http://spark.apache.org/downloads.html and
extract it into D drive, such as D:\Spark.

a. Set environmental variables:


i. User variable:
Variable: SPARK_HOME
Value: D:\spark\spark-1.6.1-bin-hadoop2.6
ii. System variable:
Variable: PATH
Value: D:\spark\spark-1.6.1-bin-hadoop2.6\bin
5. Download Windows Utilities: Download it from the link:
https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin
And paste it in D:\spark\spark-1.6.1-bin-hadoop2.6\bin
6. Execute Spark on cmd, see below:

7. Install Maven 3.3. Download Apache-Maven-3.3.9 from the link:


http://apache.mivzakim.net/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.zip
And extract it into D drive, such as D:\apache-maven-3.3.9
a. Set Environmental variables:
i. User variable
Variable: MAVEN_HOME
Value: D:\apache-maven-3.3.9
ii. System variable
Variable: Path
Value: D:\apache-maven-3.3.9\bin
b. Check on cmd, see below

8. Create first WordCount project.


a. Open Eclipse and do File New project Select Maven Project; see below.
b. Enter Group id, Artifact id, and click finish.
c. Edit pom.xml. Paste the following code.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>sparkWCexample</groupId>
<artifactId>spWCexample</artifactId>
<version>1.0-SNAPSHOT</version>

<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
</plugin>
</plugins>
</build>
</project>

d. Write your code or just copy given WordCount code from D:\spark\spark-1.6.1-bin-
hadoop2.6\examples\src\main\java\org\apache\spark\examples

e. Now, add external jar from the location D:\spark\spark-1.6.1-bin-hadoop2.6\lib and set Java 8 for
compilation; see below.
f. Build the project: Go to the following location (where we stored the project) on cmd:
D:\hadoop\examples\spWCexample
Write mvn package on cmd
g. Execute the project: Go to the following location on cmd: D:\spark\spark-1.6.1-bin-
hadoop2.6\bin
Write the following command
spark-submit --class groupid.artifactid.classname --master local[2] /path to the jar file created using maven /path
to a demo test file /path to output directory
spark-submit --class sparkWCexample.spWCexample.WC --master local[2]
/hadoop/examples/spWCexample/target/spWCexample-1.0-SNAPSHOT.jar
/hadoop/examples/spWCexample/how.txt /hadoop/examples/spWCexample/anwer.txt
h. You can also check the progress of the project at: http://localhost:4040/jobs/
i. Finally get the answers; see below.

You might also like