1、软件环境
操作系统为ubuntu 20,spark版本为3.0.1,python版本为3.8,spark安装在本机上,python安装pyspark模块
2、矢量数据
矢量数据为下载的全国县shp数据
3、代码
1、首先,使用python读取shp,生成几何信息的wkt串文本文件
from osgeo import gdal
from osgeo import ogr
strPath = "/home/linjx/Desktop/data/gis/vector/test/county.shp"
ds = ogr.Open(strPath, False)
layer = ds.GetLayer(0)
spRef = layer.GetSpatialRef()
wkt = spRef.ExportToWkt()
f = open("/home/linjx/Desktop/spwkt.txt", "w")
f.write(wkt)
f.close()
feature = layer.GetNextFeature()
i = 0
f = open("/home/linjx/Desktop/geowkts.txt", "w")
while feature != None :
geo = feature.GetGeometryRef()
wkt = geo.ExportToWkt()
f.write(wkt)
f.write("\n")
i+=1
print("%d" % i)
feature = layer.GetNextFeature()
f.close()
print("Done")
2、编写spark任务脚本
from osgeo import gdal
from osgeo import ogr
from osgeo import osr
from pyspark import SparkConf, SparkContext
def calcArea(strWkt : str) -> float:
geo = ogr.CreateGeometryFromWkt(strWkt)
return geo.Area()
conf = SparkConf().setMaster("local").setAppName("geo")
sc = SparkContext(conf = conf)
wktRDD = sc.textFile("/home/linjx/Desktop/geowkts.txt")
areaRDD = wktRDD.map(calcArea)
sumArea = areaRDD.reduce(lambda x, y: x+y)
sumArea = sumArea / 1000 / 1000
print("Area = %.9f" %sumArea)
4、执行spark脚本
1、直接运行python脚本
2、使用spark-submit提交