没有合适的资源?快使用搜索试试~ 我知道了~
大数据之数据挖掘课程:海量数据集挖掘 15-streams 共46页.pdf
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 48 浏览量
2024-06-02
13:11:32
上传
评论
收藏 1.7MB PDF 举报
温馨提示
【课程大纲】 01-Mapreduce 02-关联规则 Association rules 03-LSH Finding Similar Items:Locality Sensitive Hashing 04-LSH theory of Locality Sensitive Hashing 05-聚类算法 clustering 06-降维技术 Dimensionality Reduction:SVD&CUR 07-推荐系统 Recommender Systems:Content-based Systems&Collaborative Filtering recsys1 08-双边序列推荐 recsys2 Recommender Systems:Latent Factor Models 09-PageRank 10-WebSpam 11-图论 graphs1 12-图论 graphs2 13-大规模机器学习 Large Scale Machine Learning:SVMs 14-决策树 Decision Trees on MapReduce 15-streams 16-streams 17-advertising 18-bandits 19-submodular 20-review
资源推荐
资源详情
资源评论

























格式:csv 资源大小:143.2MB







CS246: Mining Massive Datasets
Jure Leskovec, Stanford University
http://cs246.stanford.edu

High dim.
data
Locality
sensitive
hashing
Clustering
Dimensional
ity
reduction
Graph
data
PageRank,
SimRank
Community
Detection
Spam
Detection
Infinite
data
Filtering
data
streams
Queries on
streams
Web
advertising
Machine
learning
SVM
Decision
Trees
Perceptron,
kNN
Apps
Recommen
der systems
Association
Rules
Duplicate
document
detection
2/23/2015 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2

In many data mining situations, we do not
know the entire data set in advance
Stream Management is important when the
input rate is controlled externally:
Google queries
Twitter or Facebook status updates
We can think of the data as infinite and
non-stationary (the distribution changes
over time)
2/23/2015 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3

4
Input elements enter at a rapid rate,
at one or more input ports (i.e., streams)
We call elements of the stream tuples
The system cannot store the entire stream
accessibly
Q: How do you make critical calculations
about the stream using a limited amount of
(secondary) memory?
2/23/2015 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Stochastic Gradient Descent (SGD) is an
example of a stream algorithm
In Machine Learning we call this: Online Learning
Allows for modeling problems where we have
a continuous stream of data
We want an algorithm to learn from it and
slowly adapt to the changes in data
Idea: Do slow updates to the model
SGD (SVM, Perceptron) makes small updates
So: First train the classifier on training data.
Then: For every example from the stream, we slightly
update the model (using small learning rate)
2/23/2015 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 5
剩余45页未读,继续阅读
资源评论


passionSnail
- 粉丝: 681
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 大数据视角下的语文课堂提问方法探究.docx
- 云计算市场与技术发展趋势.doc
- 通信工程施工管理概述.doc
- 关于强电线路对通信线路的影响及其防护.doc
- 集团大数据平台安全方案规划.docx
- Matlab基于腐蚀和膨胀的边缘检测.doc
- 网络监控系统解决方案酒店.doc
- 电动机智能软起动控制系统的研究与方案设计书(PLC).doc
- jAVA2程序设计基础第十三章.ppt
- 基于PLC的机械手控制设计.doc
- 医院his计算机信息管理系统故障应急预案.doc
- 企业运用移动互联网进行青年职工思想政治教育路径.docx
- 数据挖掘的六大主要功能.doc
- 大数据行政尚在跑道入口.docx
- 用Proteus和Keil建立单片机仿真工程的步骤.doc
- Internet技术与应用网络——资源管理与开发.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
