没有合适的资源？快使用搜索试试~ 我知道了~

文库首页课程资源专业指导Mining of Massive Datasets

Mining of Massive Datasets

机器学习

需积分: 9 42 下载量 197 浏览量 2015-06-10 16:38:56 上传评论收藏 2.91MB PDF 举报

温馨提示

试读

513页

The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

资源推荐

资源详情

资源评论

### 大规模数据挖掘 #### 一、绪论与背景《大规模数据集挖掘》这本书由Jure Leskovec、Anand Rajaraman以及Jeffrey D. Ullman共同编写，是斯坦福大学CS246课程的官方教材。本书主要关注的是大数据集的数据挖掘...

Mining Massive Datasets

5星 · 资源好评率100%

关于大数据的一本很好的教材，英文原版资源。

《Mining of Massive Datasets》是一本著名的计算机科学教材，专注于大数据挖掘和分析。这本书由Jure Leskovec、Anand Rajaraman和Jeff Ullman合著，深入浅出地探讨了处理大规模数据集的方法和技术。中文版的出现...

Mining

Massive

Datasets

Jure Leskovec

Stanford Univ.

Anand Rajaraman

Milliway Labs

Jeﬀrey D. Ullman

Stanford Univ.

 2010, 2011, 2012, 2013, 2014 Anand Rajaraman, Jure Leskovec,

and Jeﬀrey D. Ullman

Preface

This book evolved from material developed over several years by Anand Raja-

raman and Jeﬀ Ullman for a one-quarter course at Stanford. The course

CS345A, titled “Web Mining,” was designed as an advanced graduate course,

although it has become accessible and interesting to advanced undergraduates.

When Jure Leskovec joined the Stanford faculty, we reorganized the material

considerably. He introduced a new cours e CS224W on network analysis and

added material to CS345A, which was re numbered CS246. The three authors

also introduced a large-scale data-mining project course, CS341. The book now

contains ma terial taught in all three courses.

What the Book Is About

At the highest level of description, this book is about data mining. However,

it focuses on data mining of very large amounts of data, that is, data so large

it does not ﬁt in main memory. Because of the emphasis on size, many of our

examples are about the Web or data derived from the Web. Further, the book

takes an algorithmic point of view: data mining is about applying algorithms

to data, rather than using data to “train” a machine-learning engine of some

sort. The principal topics covered are:

1. Distributed ﬁle systems and map-reduce as a tool for creating parallel

algorithms that succeed on very large amounts o f data.

2. Similarity search, including the key techniques of minhashing and locality-

sensitive hashing.

3. Data-stre am processing and specialized a lgorithms for dealing with data

that arrives so fast it must be pr ocessed immediately or lost.

4. The technology of search engines, including Google’s PageRank, link-spam

detection, and the hubs-and-authorities approach.

5. Frequent-itemset mining, including association rules, market-baskets, the

A-Priori Algorithm and its improvements.

6. Algor ithms for clustering very large , high-dimensional datase ts.

iii

iv PREFACE

7. Two key problems for Web applicatio ns: managing advertising and rec-

ommendation systems.

8. Algor ithms for analyzing and mining the structure of very large graphs,

especially social-network graphs.

9. Techniques for obtaining the impor tant properties of a large da taset by

dimensionality reduction, including singular-value decomposition and la-

tent semantic indexing.

10. Machine-learning algorithms that can be applied to very large data, such

as perce ptr ons, support-vector ma chines, and gradient desc e nt.

Prerequisites

To appreciate fully the ma terial in this book , we recommend the following

prerequisites:

1. An introduction to database systems, covering SQL and related pro gram-

ming systems.

2. A sophomore-level course in data structure s, algo rithms, and discrete

math.

3. A sopho more-level course in software systems, software engineering, and

programming languages.

Exercises

The book contains extensive exercises, w ith some for almost every section. We

indicate ha rder exercises or pa rts of exercises with an excla mation point. T he

hardest exercises have a double exclamation point.

Support on the Web

Go to http://www.mmds.org for slides, ho mework assignments, project require-

ments, and exa ms from courses related to this book.

Gradiance Automated Homework

There are automated exe rcises based on this book, using the Gradiance root-

question technology, available at www.gradiance.com/services. Students may

enter a public class by c reating an account at that site and entering the class

with code 1EDD8A1D. Instructors may use the site by making an account there

PREFACE v

and then emailing support at gradiance dot com with their login name, the

name of their school, and a request to use the MMDS materials.

Acknowledgements

Cover art is by Scott Ullman.

We would like to thank Foto Afrati, Arun Marathe, and Rok Sosic for critical

readings of a draft of this manuscript.

Errors were also reported by Rajiv Abraham, Apoorv Agarwal, Aris Anag-

nostopoulos, Atilla Soner Balkir, Arnaud Belletoile, Robin B e nnett, Susan Bian-

cani, Amitabh Chaudhar y, Leland Chen, Anastasios Gounaris, Shrey Gupta,

Waleed Hameid, Saman Haratizadeh, Lachlan Kang, Ed Knorr, Haewoon Kwak,

Ellis Lau, Greg Lee, Ethan Lozano, Yunan Luo, Michael Mahoney, Justin

Meyer, Bryant Moscon, Brad Penoﬀ, Philips Kokoh Prasetyo, Qi Ge, Rich

Seiter, Hitesh Shetty, Angad Singh, Sandeep Sripada, Dennis Sidharta, Krzysztof

Stencel, Mark Storus , Ro shan Sumbaly, Zack Taylor, Tim Triche Jr., Wang

Bin, We ng Zhen-Bin, Robert West, Oscar Wu, Xie Ke, Nicolas Zhao, and Zhou

Jingbo, The remaining errors are ours, of cour se.

J. L.

A. R.

J. D. U.

Palo Alto, CA

March, 2014

剩余512页未读，继续阅读

评论收藏

内容反馈

资源评论

资源反馈

评论星级较低，若资源使用遇到问题可联系上传者，3个工作日内问题未解决可申请退款~

lengwuqin

粉丝: 139

上传资源快速赚钱

我的内容管理展开

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

前往需求广场，查看用户热搜

Mining of Massive Datasets

《Mining of Massive Datasets》

斯坦福大学book-Mining of Massive Datasets

Anand.Rajaraman-Mining of Massive Datasets

Mining of Massive Datasets, 英文原版，斯坦福CS246官方教程

Mining Massive Datasets

Mining of massive datasets

mining of massive datasets

Mining of Massive Datasets.zip

Mining of Massive Datasets.pdf

Mining of Massive Dataset的中文版

Mining of Massive Datasets（2nd edition）

@KafkaListener详解与使用

linux-Linux性能优化实战案例

最新资源