Akmal Chaudhri, GridGain Systems
How to share state across
multiple Spark jobs using
Apache Ignite
#EUde9
Agenda
•  Introduction to Apache Ignite
•  Ignite for Spark
•  IgniteContext and IgniteRDD
•  Installation and Deployment
•  Demos
•  Q&A
2#EUde9
Introduction to Apache Ignite
3#EUde9
Apache Ignite in one slide
•  Memory-centric platform
–  that is strongly consistent
–  and highly-available
–  with powerful SQL
–  key-value and processing
APIs
•  Designed for
–  Performance
–  Scalability
4#EUde9
Apache Ignite
•  Data source agnostic
•  Fully fledged compute engine and durable storage
•  OLAP and OLTP
•  Fully ACID transactions across memory and disk
•  In-memory SQL support
•  Early ML libraries
•  Growing community
5#EUde9
Ignite for Spark
6#EUde9
Why share state in Spark?
•  Long running applications
–  Passing state between jobs
•  Disk File System
–  Convert RDDs to disk files and back
•  Share RDDs in-memory
–  Native Spark API
–  Native Spark transformations
7#EUde9
Ignite for Spark
•  Spark RDD abstraction
•  Shared in-memory view
on data across different
Spark jobs, workers or
applications
•  Implemented as a view
over a distributed Ignite
cache
8#EUde9
Ignite for Spark
•  Deployment modes
–  Share RDD across tasks on the host
–  Share RDD across tasks in the application
–  Share RDD globally
•  Shared state can be
–  Standalone mode (outlives Spark application)
–  Embedded mode (lifetime of Spark application)
9#EUde9
Ignite In-Memory File System
•  Distributed in-memory
file system
•  Implements HDFS
API
•  Can be transparently
plugged into Hadoop
or Spark deployments
10#EUde9
IgniteContext and IgniteRDD
11#EUde9
IgniteContext
•  Main entry-point to Spark-Ignite integration
•  SparkContext plus either one of
–  IgniteConfiguration()
–  Path to XML configuration file
•  Optional Boolean client argument
–  true => Shared deployment
–  false => Embedded deployment
12#EUde9
IgniteContext examples
13#EUde9
val igniteContext = new IgniteContext(sparkContext,
() => new IgniteConfiguration())
val igniteContext = new IgniteContext(sparkContext,
"examples/config/spark/example-shared-rdd.xml")
IgniteRDD
•  Implementation of Spark RDD representing a live
view of an Ignite cache
•  Mutable (unlike native RDDs)
–  All changes in Ignite cache will be visible to RDD users
immediately
•  Provides partitioning information to Spark executor
•  Provides affinity information to Spark so that RDD
computations can use data locality
14#EUde9
Write to Ignite
•  Ignite caches operate on key-value pairs
•  Spark tuple RDD for key-value pairs and
savePairs method
–  RDD partitioning, store values in parallel if possible
•  Value-only RDD and saveValues method
–  IgniteRDD generates a unique affinity-local key for
each value stored into the cache
15#EUde9
Write code example
16#EUde9
val conf = new SparkConf().setAppName("SparkIgniteWriter")
val sc = new SparkContext(conf)
val ic = new IgniteContext(sc,
"examples/config/spark/example-shared-rdd.xml")
val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD")
sharedRDD.savePairs(sc.parallelize(1 to 100000, 10)
.map(i => (i, i)))
Read from Ignite
•  IgniteRDD is a live view of an Ignite cache
–  No need to explicitly load data to Spark application
from Ignite
–  All RDD methods are available to use right away after
an instance of IgniteRDD is created
17#EUde9
Read code example
18#EUde9
val conf = new SparkConf().setAppName("SparkIgniteReader")
val sc = new SparkContext(conf)
val ic = new IgniteContext(sc,
"examples/config/spark/example-shared-rdd.xml")
val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD")
val greaterThanFiftyThousand = sharedRDD.filter(_._2 > 50000)
println("The count is "+greaterThanFiftyThousand.count())
Installation and Deployment
19#EUde9
Installation and Deployment
•  Shared Deployment
•  Embedded Deployment
•  Maven
•  SBT
20#EUde9
Shared Deployment
•  Standalone mode
•  Ignite nodes deployed with Spark worker nodes
•  Add following lines to spark-env.sh
21#EUde9
IGNITE_LIBS="${IGNITE_HOME}/libs/*"
for file in ${IGNITE_HOME}/libs/*
do
if [ -d ${file} ] && [ "${file}" != "${IGNITE_HOME}"/libs/optional ]; then
IGNITE_LIBS=${IGNITE_LIBS}:${file}/*
fi
done
export SPARK_CLASSPATH=$IGNITE_LIBS
Embedded Deployment
•  Ignite nodes are started inside Spark job
processes and are stopped when job dies
•  Ignite code distributed to worker machines using
Spark deployment mechanism
•  Ignite nodes will be started on all workers as a
part of IgniteContext initialization
22#EUde9
Maven
•  Ignite’s Spark artifact hosted in Maven Central
•  Scala 2.11 example
23#EUde9
<dependency>
<groupId>org.apache.ignite</groupId>
<artifactId>ignite-spark</artifactId>
<version>${ignite.version}</version>
</dependency>
SBT
•  Ignite’s Spark artifact added to build.sbt
•  Scala 2.11 example
24#EUde9
libraryDependencies += "org.apache.ignite"
% "ignite-spark" % "ignite.version"
Demos
25#EUde9
Resources
•  Ignite for Spark documentation
–  https://apacheignite-fs.readme.io/docs/ignite-for-spark
•  Spark Data Frames Support in Apache Ignite
–  https://issues.apache.org/jira/browse/IGNITE-3084
•  Code examples
–  https://github.com/apache/ignite/ =>
ScalarSharedRDDExample.scala
–  https://github.com/apache/ignite/ =>
SharedRDDExample.java
26#EUde9
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
27#EUde9

More Related Content

PDF
Git Webhook Proxy
PDF
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
PDF
Apache spark 2.3 and beyond
PPTX
Programming in Spark using PySpark
PDF
Apache Flume
PDF
Understanding Memory Management In Spark For Fun And Profit
PPTX
Linqの速度測ってみた
PPTX
冬のLock free祭り safe
Git Webhook Proxy
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Apache spark 2.3 and beyond
Programming in Spark using PySpark
Apache Flume
Understanding Memory Management In Spark For Fun And Profit
Linqの速度測ってみた
冬のLock free祭り safe

What's hot (20)

PDF
Hudi architecture, fundamentals and capabilities
PPTX
Hive+Tez: A performance deep dive
PDF
JournĂŠe DevOps : La boite Ă  outil d'une ĂŠquipe DevOps
PDF
IPv4/IPv6 移行・共存技術の動向
PDF
P2P Container Image Distribution on IPFS With containerd and nerdctl
PDF
Blazing Performance with Flame Graphs
PDF
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
PDF
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
PPTX
Apache Pinot Meetup Sept02, 2020
PDF
Como fazemos deploys no nubank
PDF
仮想化環境におけるパケットフォワーディング
PPTX
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PPTX
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
KEY
Vyatta 改造入門
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
PDF
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
PDF
Spark sql
PDF
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Hudi architecture, fundamentals and capabilities
Hive+Tez: A performance deep dive
JournĂŠe DevOps : La boite Ă  outil d'une ĂŠquipe DevOps
IPv4/IPv6 移行・共存技術の動向
P2P Container Image Distribution on IPFS With containerd and nerdctl
Blazing Performance with Flame Graphs
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Apache Pinot Meetup Sept02, 2020
Como fazemos deploys no nubank
仮想化環境におけるパケットフォワーディング
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
Designing Structured Streaming Pipelines—How to Architect Things Right
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
Vyatta 改造入門
Apache Spark on K8S Best Practice and Performance in the Cloud
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Spark sql
OCIv2?!軽量高速なイケてる次世代イメージ仕様の最新動向を抑えよう!
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Ad

Viewers also liked (18)

PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
PDF
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
PDF
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
PPTX
Low Touch Machine Learning with Leah McGuire (Salesforce)
PDF
Building Machine Learning Algorithms on Apache Spark with William Benton
PDF
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
PDF
Feature Hashing for Scalable Machine Learning with Nick Pentreath
PDF
Experimental Design for Distributed Machine Learning with Myles Baker
PDF
Art of Feature Engineering for Data Science with Nabeel Sarwar
PPTX
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark and Tensorflow as a Service with Jim Dowling
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Low Touch Machine Learning with Leah McGuire (Salesforce)
Building Machine Learning Algorithms on Apache Spark with William Benton
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Feature Hashing for Scalable Machine Learning with Nick Pentreath
Experimental Design for Distributed Machine Learning with Myles Baker
Art of Feature Engineering for Data Science with Nabeel Sarwar
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Ad

Similar to How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with Akmal Chaudri (20)

PDF
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
PPTX
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
PDF
Apache Spark Tutorial
PPTX
Intro to Apache Spark
PPTX
Intro to Apache Spark
PDF
Putting the Spark into Functional Fashion Tech Analystics
PDF
Hands on with Apache Spark
PDF
Introduction to Apache Spark
PPTX
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
PDF
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
PDF
Fast Data Analytics with Spark and Python
PDF
JDG 7 & Spark Integration
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PPTX
spark example spark example spark examplespark examplespark examplespark example
PPTX
An Introduction to Apache Spark
PDF
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
PDF
LCU14 310- Cisco ODP v2
 
PPTX
spark ...................................
PDF
Spark day 2017 - Spark on Kubernetes
PDF
From development environments to production deployments with Docker, Compose,...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Apache Spark Tutorial
Intro to Apache Spark
Intro to Apache Spark
Putting the Spark into Functional Fashion Tech Analystics
Hands on with Apache Spark
Introduction to Apache Spark
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Fast Data Analytics with Spark and Python
JDG 7 & Spark Integration
Scaling your Data Pipelines with Apache Spark on Kubernetes
spark example spark example spark examplespark examplespark examplespark example
An Introduction to Apache Spark
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
LCU14 310- Cisco ODP v2
 
spark ...................................
Spark day 2017 - Spark on Kubernetes
From development environments to production deployments with Docker, Compose,...

More from Spark Summit (20)

PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
PDF
Powering a Startup with Apache Spark with Kevin Kim
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
PDF
Goal Based Data Production with Sim Simeonov
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
PDF
Variant-Apache Spark for Bioinformatics with Piotr Szul
PDF
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
PDF
Best Practices for Using Alluxio with Apache Spark with Gene Pang
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
PDF
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark and Tensorflow as a Service with Jim Dowling
Next CERN Accelerator Logging Service with Jakub Wozniak
Powering a Startup with Apache Spark with Kevin Kim
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Goal Based Data Production with Sim Simeonov
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Variant-Apache Spark for Bioinformatics with Piotr Szul
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...

Recently uploaded (20)

PPTX
research framework and review of related literature chapter 2
PPTX
Sistem Informasi Manejemn-Sistem Manajemen Database
PPTX
AI-Augmented Business Process Management Systems
PPT
Drug treatment of Malbbbbbhhbbbbhharia.ppt
PPTX
cyber row.pptx for cyber proffesionals and hackers
 
PDF
PPT nikita containers of the company use
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PPT
Handout for Lean and Six Sigma application
PPT
What is life? We never know the answer exactly
PPTX
Stats annual compiled ipd opd ot br 2024
PPTX
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
PDF
The high price of a dog bite in California
PDF
American Journal of Multidisciplinary Research and Review
PDF
Mcdonald's : a half century growth . pdf
PPT
Technicalities in writing workshops indigenous language
PDF
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
 
PPTX
cardiac failure and associated notes.pptx
PPTX
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
PDF
Book Trusted Companions in Delhi – 24/7 Available Delhi Personal Meeting Ser...
 
PDF
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now
research framework and review of related literature chapter 2
Sistem Informasi Manejemn-Sistem Manajemen Database
AI-Augmented Business Process Management Systems
Drug treatment of Malbbbbbhhbbbbhharia.ppt
cyber row.pptx for cyber proffesionals and hackers
 
PPT nikita containers of the company use
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
Handout for Lean and Six Sigma application
What is life? We never know the answer exactly
Stats annual compiled ipd opd ot br 2024
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
The high price of a dog bite in California
American Journal of Multidisciplinary Research and Review
Mcdonald's : a half century growth . pdf
Technicalities in writing workshops indigenous language
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
 
cardiac failure and associated notes.pptx
Dkdkskakkakakakskskdjddidiiffiiddakaka.pptx
Book Trusted Companions in Delhi – 24/7 Available Delhi Personal Meeting Ser...
 
Delhi c@ll girl# cute girls in delhi with travel girls in delhi call now

How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with Akmal Chaudri

  • 1. Akmal Chaudhri, GridGain Systems How to share state across multiple Spark jobs using Apache Ignite #EUde9
  • 2. Agenda •  Introduction to Apache Ignite •  Ignite for Spark •  IgniteContext and IgniteRDD •  Installation and Deployment •  Demos •  Q&A 2#EUde9
  • 3. Introduction to Apache Ignite 3#EUde9
  • 4. Apache Ignite in one slide •  Memory-centric platform –  that is strongly consistent –  and highly-available –  with powerful SQL –  key-value and processing APIs •  Designed for –  Performance –  Scalability 4#EUde9
  • 5. Apache Ignite •  Data source agnostic •  Fully fledged compute engine and durable storage •  OLAP and OLTP •  Fully ACID transactions across memory and disk •  In-memory SQL support •  Early ML libraries •  Growing community 5#EUde9
  • 7. Why share state in Spark? •  Long running applications –  Passing state between jobs •  Disk File System –  Convert RDDs to disk files and back •  Share RDDs in-memory –  Native Spark API –  Native Spark transformations 7#EUde9
  • 8. Ignite for Spark •  Spark RDD abstraction •  Shared in-memory view on data across different Spark jobs, workers or applications •  Implemented as a view over a distributed Ignite cache 8#EUde9
  • 9. Ignite for Spark •  Deployment modes –  Share RDD across tasks on the host –  Share RDD across tasks in the application –  Share RDD globally •  Shared state can be –  Standalone mode (outlives Spark application) –  Embedded mode (lifetime of Spark application) 9#EUde9
  • 10. Ignite In-Memory File System •  Distributed in-memory file system •  Implements HDFS API •  Can be transparently plugged into Hadoop or Spark deployments 10#EUde9
  • 12. IgniteContext •  Main entry-point to Spark-Ignite integration •  SparkContext plus either one of –  IgniteConfiguration() –  Path to XML configuration file •  Optional Boolean client argument –  true => Shared deployment –  false => Embedded deployment 12#EUde9
  • 13. IgniteContext examples 13#EUde9 val igniteContext = new IgniteContext(sparkContext, () => new IgniteConfiguration()) val igniteContext = new IgniteContext(sparkContext, "examples/config/spark/example-shared-rdd.xml")
  • 14. IgniteRDD •  Implementation of Spark RDD representing a live view of an Ignite cache •  Mutable (unlike native RDDs) –  All changes in Ignite cache will be visible to RDD users immediately •  Provides partitioning information to Spark executor •  Provides affinity information to Spark so that RDD computations can use data locality 14#EUde9
  • 15. Write to Ignite •  Ignite caches operate on key-value pairs •  Spark tuple RDD for key-value pairs and savePairs method –  RDD partitioning, store values in parallel if possible •  Value-only RDD and saveValues method –  IgniteRDD generates a unique affinity-local key for each value stored into the cache 15#EUde9
  • 16. Write code example 16#EUde9 val conf = new SparkConf().setAppName("SparkIgniteWriter") val sc = new SparkContext(conf) val ic = new IgniteContext(sc, "examples/config/spark/example-shared-rdd.xml") val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD") sharedRDD.savePairs(sc.parallelize(1 to 100000, 10) .map(i => (i, i)))
  • 17. Read from Ignite •  IgniteRDD is a live view of an Ignite cache –  No need to explicitly load data to Spark application from Ignite –  All RDD methods are available to use right away after an instance of IgniteRDD is created 17#EUde9
  • 18. Read code example 18#EUde9 val conf = new SparkConf().setAppName("SparkIgniteReader") val sc = new SparkContext(conf) val ic = new IgniteContext(sc, "examples/config/spark/example-shared-rdd.xml") val sharedRDD: IgniteRDD[Int, Int] = ic.fromCache("sharedRDD") val greaterThanFiftyThousand = sharedRDD.filter(_._2 > 50000) println("The count is "+greaterThanFiftyThousand.count())
  • 20. Installation and Deployment •  Shared Deployment •  Embedded Deployment •  Maven •  SBT 20#EUde9
  • 21. Shared Deployment •  Standalone mode •  Ignite nodes deployed with Spark worker nodes •  Add following lines to spark-env.sh 21#EUde9 IGNITE_LIBS="${IGNITE_HOME}/libs/*" for file in ${IGNITE_HOME}/libs/* do if [ -d ${file} ] && [ "${file}" != "${IGNITE_HOME}"/libs/optional ]; then IGNITE_LIBS=${IGNITE_LIBS}:${file}/* fi done export SPARK_CLASSPATH=$IGNITE_LIBS
  • 22. Embedded Deployment •  Ignite nodes are started inside Spark job processes and are stopped when job dies •  Ignite code distributed to worker machines using Spark deployment mechanism •  Ignite nodes will be started on all workers as a part of IgniteContext initialization 22#EUde9
  • 23. Maven •  Ignite’s Spark artifact hosted in Maven Central •  Scala 2.11 example 23#EUde9 <dependency> <groupId>org.apache.ignite</groupId> <artifactId>ignite-spark</artifactId> <version>${ignite.version}</version> </dependency>
  • 24. SBT •  Ignite’s Spark artifact added to build.sbt •  Scala 2.11 example 24#EUde9 libraryDependencies += "org.apache.ignite" % "ignite-spark" % "ignite.version"
  • 26. Resources •  Ignite for Spark documentation –  https://apacheignite-fs.readme.io/docs/ignite-for-spark •  Spark Data Frames Support in Apache Ignite –  https://issues.apache.org/jira/browse/IGNITE-3084 •  Code examples –  https://github.com/apache/ignite/ => ScalarSharedRDDExample.scala –  https://github.com/apache/ignite/ => SharedRDDExample.java 26#EUde9
  • 27. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org 27#EUde9