This document provides an introduction to Apache Spark presented by Maxime Dumas of Cloudera. It discusses Spark's advantages over MapReduce like leveraging distributed memory for better performance and supporting iterative algorithms. Spark concepts like RDDs, transformations and actions are explained. Examples shown include word count, logistic regression, and Spark Streaming. The presentation concludes with a discussion of SQL on Spark and a demo.
Related topics: