pyspark 协同过滤矩阵分解ALS 电影数据

本文介绍如何使用Apache Spark的交替最小二乘(Alternating Least Squares, ALS)算法进行电影推荐系统的实现。通过数据预处理、模型训练、评估及推荐结果生成,详细展示了ALS在实际场景中的应用流程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

数据下载:

https://github.com/apache/spark/tree/master/data/mllib/als

代码案例参考:

https://github.com/apache/spark/blob/master/examples/src/main/python/ml/als_example.py

需要注意代码里long(p[3])改成float
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]), rating=float(p[2]), timestamp=float(p[3])))

代码:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row
import os
os.environ["PYSPARK_PYTHON"]="/Users/lonng/opt/anaconda3/python.app/Contents/MacOS/python"


spark = SparkSession\
        .builder\
        .appName("ALSExample")\
        .getOrCreate()

# $example on$
lines = spark.read.text("./sample_movielens_ratings.txt").rdd

parts = lines.map(lambda row: row.value.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
                                     rating=float(p[2]), timestamp=float(p[3])))
ratings = spark.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating",
              coldStartStrategy="drop")
model = als.fit(training)

# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

# Generate top 10 movie recommendations for each user
userRecs = model.recommendForAllUsers(10)
userRecs.show()

# Generate top 10 movie recommendations for a specified set of users
users = ratings.select(als.getUserCol()).distinct().limit(3)
userSubsetRecs = model.recommendForUserSubset(users, 10)
userSubsetRecs.show()


# Generate top 10 user recommendations for each movie
movieRecs = model.recommendForAllItems(10)
movieRecs.show()


# Generate top 10 user recommendations for a specified set of movies
movies = ratings.select(als.getItemCol()).distinct().limit(3)
movieSubSetRecs = model.recommendForItemSubset(movies, 10)

movieSubSetRecs.show()



模型评分

# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))

在这里插入图片描述

对于单个user或moive推荐可能的电影或可能的用户,参考http://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html

¥** 正常第一种方式速度快一些

1、在使用了userRecs = model.recommendForAllUsers(10) , model.recommendForAllItems(10)后:


userRecs.where(userRecs.userId == 28).select("recommendations.movieId", "recommendations.rating").collect()

movieRecs.where(movieRecs.movieId == 31)\
        .select("recommendations.userId", "recommendations.rating").collect()


2、使用model.recommendForUserSubse、recommendForItemSubsett形式

user_subset = ratings.where(ratings.userId == 28)
model.recommendForUserSubset(user_subset,10).select("recommendations.movieId", "recommendations.rating").first()


item_subset = ratings.where(ratings.item_id == 2)
item_subset_recs = model.recommendForItemSubset(item_subset, 3)
item_subset_recs.select("recommendations.user_id", "recommendations.rating").first()


在这里插入图片描述

在这里插入图片描述

全部推荐可以下列方式读取,或后续存入数据库比如redis

在这里插入图片描述
在这里插入图片描述

pandas dataframe保存到redis方式,默认保存到db0,可以StrictRedis改port,db库

在这里插入图片描述

模型保存与加载

from pyspark.ml.recommendation import ALS,ALSModel


model_path = temp_path + "/als_model"
model.save(model_path)
model2 = ALSModel.load(model_path)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

loong_XL

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值