没有合适的资源?快使用搜索试试~ 我知道了~
增强学习导论(Reinforcement Learning An Introduction_Sutton)
需积分: 50 102 下载量 133 浏览量
2017-09-05
20:44:36
上传
评论 3
收藏 5.75MB PDF 举报
温馨提示
本书作者是Richard S. Sutton,是增强学习领域的开山之作,虽然这本书不算太新,单它奠定了RL的基本理论框架,所以将它作为入门级的著作是十分有用的。
资源推荐
资源详情
资源评论



格式:pdf 资源大小:5.4MB 页数:127

格式:pdf 资源大小:12.0MB 页数:127





格式:pdf 资源大小:265.1KB 页数:35





格式:pdf 资源大小:8.8MB 页数:127


格式:pdf 资源大小:7.5MB 页数:127










Book
Next: Contents Contents
Reinforcement Learning:
An Introduction
Richard S. Sutton and Andrew G. Barto
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
In memory of A. Harry Klopf
● Contents
❍ Preface
❍ Series Forward
❍ Summary of Notation
● I. The Problem
❍ 1. Introduction
■ 1.1 Reinforcement Learning
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (1 di 4)22/06/2005 9.04.27

Book
■ 1.2 Examples
■ 1.3 Elements of Reinforcement Learning
■ 1.4 An Extended Example: Tic-Tac-Toe
■ 1.5 Summary
■ 1.6 History of Reinforcement Learning
■ 1.7 Bibliographical Remarks
❍ 2. Evaluative Feedback
■ 2.1 An -Armed Bandit Problem
■ 2.2 Action-Value Methods
■ 2.3 Softmax Action Selection
■ 2.4 Evaluation Versus Instruction
■ 2.5 Incremental Implementation
■ 2.6 Tracking a Nonstationary Problem
■ 2.7 Optimistic Initial Values
■ 2.8 Reinforcement Comparison
■ 2.9 Pursuit Methods
■ 2.10 Associative Search
■ 2.11 Conclusions
■ 2.12 Bibliographical and Historical Remarks
❍ 3. The Reinforcement Learning Problem
■ 3.1 The Agent-Environment Interface
■ 3.2 Goals and Rewards
■ 3.3 Returns
■ 3.4 Unified Notation for Episodic and Continuing Tasks
■ 3.5 The Markov Property
■ 3.6 Markov Decision Processes
■ 3.7 Value Functions
■ 3.8 Optimal Value Functions
■ 3.9 Optimality and Approximation
■ 3.10 Summary
■ 3.11 Bibliographical and Historical Remarks
● II. Elementary Solution Methods
❍ 4. Dynamic Programming
■ 4.1 Policy Evaluation
■ 4.2 Policy Improvement
■ 4.3 Policy Iteration
■ 4.4 Value Iteration
■ 4.5 Asynchronous Dynamic Programming
■ 4.6 Generalized Policy Iteration
■ 4.7 Efficiency of Dynamic Programming
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (2 di 4)22/06/2005 9.04.27

Book
■ 4.8 Summary
■ 4.9 Bibliographical and Historical Remarks
❍ 5. Monte Carlo Methods
■ 5.1 Monte Carlo Policy Evaluation
■ 5.2 Monte Carlo Estimation of Action Values
■ 5.3 Monte Carlo Control
■ 5.4 On-Policy Monte Carlo Control
■ 5.5 Evaluating One Policy While Following Another
■ 5.6 Off-Policy Monte Carlo Control
■ 5.7 Incremental Implementation
■ 5.8 Summary
■ 5.9 Bibliographical and Historical Remarks
❍ 6. Temporal-Difference Learning
■ 6.1 TD Prediction
■ 6.2 Advantages of TD Prediction Methods
■ 6.3 Optimality of TD(0)
■ 6.4 Sarsa: On-Policy TD Control
■ 6.5 Q-Learning: Off-Policy TD Control
■ 6.6 Actor-Critic Methods
■ 6.7 R-Learning for Undiscounted Continuing Tasks
■ 6.8 Games, Afterstates, and Other Special Cases
■ 6.9 Summary
■ 6.10 Bibliographical and Historical Remarks
● III. A Unified View
❍ 7. Eligibility Traces
■ 7.1 -Step TD Prediction
■ 7.2 The Forward View of TD( )
■ 7.3 The Backward View of TD( )
■ 7.4 Equivalence of Forward and Backward Views
■ 7.5 Sarsa( )
■ 7.6 Q( )
■ 7.7 Eligibility Traces for Actor-Critic Methods
■ 7.8 Replacing Traces
■ 7.9 Implementation Issues
■ 7.10 Variable
■ 7.11 Conclusions
■ 7.12 Bibliographical and Historical Remarks
❍ 8. Generalization and Function Approximation
■ 8.1 Value Prediction with Function Approximation
■ 8.2 Gradient-Descent Methods
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (3 di 4)22/06/2005 9.04.27

Book
■ 8.3 Linear Methods
■ 8.3.1 Coarse Coding
■ 8.3.2 Tile Coding
■ 8.3.3 Radial Basis Functions
■ 8.3.4 Kanerva Coding
■ 8.4 Control with Function Approximation
■ 8.5 Off-Policy Bootstrapping
■ 8.6 Should We Bootstrap?
■ 8.7 Summary
■ 8.8 Bibliographical and Historical Remarks
❍ 9. Planning and Learning
■ 9.1 Models and Planning
■ 9.2 Integrating Planning, Acting, and Learning
■ 9.3 When the Model Is Wrong
■ 9.4 Prioritized Sweeping
■ 9.5 Full vs. Sample Backups
■ 9.6 Trajectory Sampling
■ 9.7 Heuristic Search
■ 9.8 Summary
■ 9.9 Bibliographical and Historical Remarks
❍ 10. Dimensions of Reinforcement Learning
■ 10.1 The Unified View
■ 10.2 Other Frontier Dimensions
❍ 11. Case Studies
■ 11.1 TD-Gammon
■ 11.2 Samuel's Checkers Player
■ 11.3 The Acrobot
■ 11.4 Elevator Dispatching
■ 11.5 Dynamic Channel Allocation
■ 11.6 Job-Shop Scheduling
● Bibliography
❍ Index
Mark Lee 2005-01-04
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (4 di 4)22/06/2005 9.04.27

Contents
Next: Preface Up: Book Previous: Book
Contents
● I. The Problem
❍ 1. Introduction
❍ 2. Evaluative Feedback
❍ 3. The Reinforcement Learning Problem
● II. Elementary Solution Methods
❍ 4. Dynamic Programming
❍ 5. Monte Carlo Methods
❍ 6. Temporal-Difference Learning
● III. A Unified View
❍ 7. Eligibility Traces
❍ 8. Generalization and Function Approximation
❍ 9. Planning and Learning
❍ 10. Dimensions of Reinforcement Learning
❍ 11. Case Studies
● Bibliography
Subsections
❍ Preface
❍ Series Forward
❍ Summary of Notation
Mark Lee 2005-01-04
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node1.html22/06/2005 9.04.31
剩余397页未读,继续阅读
资源评论


shuqi_wu
- 粉丝: 1
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 【人工智能领域】人工智能与机器学习的区别与联系:从定义、范围到应用场景的全面解析
- 西门子S7-1200 Modbus TCP主从通讯:含程序、软件及说明书的完整解决方案
- 【人工智能领域】技术创新与应用拓展:大模型架构优化及AGI探索加速推动产业发展和社会变革
- 工业自动化领域OPC DA至MQTT协议转换的技术实现与应用
- 线性代数计算库OpenBLAS 0.3.28
- 配电网扩展规划模型:综合考虑电压约束与多种约束条件的研究及MATLAB实现
- 基于ElasticSearch构建的新闻研报互动易搜索引擎项目-集成中文分词插件与Redis热词统计功能-支持文档索引的CRUD操作和批量处理-用于金融信息检索与数据分析学习测试-.zip
- 使用目标检测框架完成麦穗检测
- FPGA纯Verilog代码实现JPG解码转RGB:从图片到显示器的全过程工程源码 JPG解码 2024版
- ANSYS桥梁建模实战教程:从零开始掌握命令流与工程应用技巧 · 有限元分析
- 适用于无 GPU 嵌入式设备的轻量快速目标检测代码
- 基于MATLAB与CPLEXGurobi平台的电力系统机组组合优化调度研究(含直流潮流约束)
- VTK用于支持Opencv VIZ模块显示3D图像
- 基于MATLAB-YALMIP-CPLEX的碳捕集电厂与需求响应的综合能源系统多时间尺度优化调度
- COMSOL EBG能带结构计算与伪模式去除的技术解析及应用
- 三相三电平维也纳整流器全C代码+仿真模型:电压外环电流内环双闭环dq解耦控制与SOGI-PLL锁相环的在线仿真 详细版
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
