This document summarizes Madhukara Phatak's journey working with machine learning and big data technologies. It describes his work with Hadoop, Mahout, JavaScript, Scala, Spark, MLLib and building a rumor engine application. Some key points include:
- He encountered challenges with Mahout's performance and complexity which led him to explore Spark and build his own libraries.
- Spark provided better support for iterative algorithms, caching, and was more productive for machine learning compared to MapReduce.
- His rumor engine application classified blog posts using Naive Bayes with MLLib and had high performance on Spark.
- Finding skilled data scientists remains a challenge due to the unique combination of skills required.