1、http://dblab.xmu.edu.cn/blog/1689-2/ spark安装和使用
from pyspark import SparkContext
sc = SparkContext( 'local', 'test')
logFile = "/user/tmp/offline/jsy/test/1.txt"
logData = sc.textFile(logFile, 2).cache()
numAs = logData.filter(lambda line: 'a' in line).count()
numBs = logData.filter(lambda line: 'b' in line).count()
print('Lines with a: %s, Lines with b: %s' % (numAs, numBs))
2、
http://dblab.xmu.edu.cn/blog/1709-2/ (spark学习目录)
3、快速入门官网
4、spark运行原理详解
https://www.jianshu.com/p/e9845cac935c
5、spark参数详解
https://www.cnblogs.com/yangcx666/p/8723826.html