题目简述
利用pyspark对于武汉租房数据进行分析,可以爬取不同地区套用本代码。
代码如下:
from pyspark.sql import SparkSession
from pyspark.sql.types import IntegerType
import pandas as pd
from pyspark.ml.stat import Correlation
import matplotlib.pyplot as plt
spark = SparkSession.builder.master("local").appName("rent_analyse").getOrCreate()
df1 = spark.read.csv("zh.csv", header=True,encoding="UTF-8")
df1=df1.withColumn("租金",df1.租金.cast(IntegerType()))
df1=df1.withColumn("面积",df1.面积.cast(IntegerType()))
area=df1.select("区划").distinct().collect()
place=[]
for i in area:
temp=i.asDict()
l=