sumo_reinforcement_learning:与斯坦福大学机器学习课程（CS229）的最终项目相关的源代码；在SUMO交通模拟环境中使用强化学习方法-Sourcecodelearning

共12个文件

py：6个

pdf：2个

xml：2个

147 浏览量 2021-03-25 02:37:29 上传评论 11 收藏 2.37MB ZIP 举报

本文将深入探讨在SUMO（Simulation of Urban MObility，城市交通模拟）环境中利用强化学习进行智能决策的源代码实现，这些代码与斯坦福大学的机器学习课程CS 229的最终项目密切相关。SUMO是一款强大的开源交通模拟工具，能够模拟复杂的交通网络并为智能体提供真实世界的交通环境。在强化学习中，智能体通过与环境互动来学习最佳策略，以最大化预期的奖励。在SUMO交通模拟场景下，智能体可能是一个自动驾驶车辆，目标是通过最小化能耗、减少旅行时间或避免事故来优化其行驶策略。强化学习模型通常包括四个关键组成部分：状态（State）、动作（Action）、奖励（Reward）和策略（Policy）。在SUMO_reinforcement_learning-master压缩包中，我们可以期待找到以下内容： 1. **环境模块**：这部分代码实现了与SUMO的接口，将SUMO的状态信息（如车流量、速度、位置等）转化为强化学习智能体可以理解的状态，并处理智能体执行的动作（如改变速度、变换车道等），以及更新环境的反馈（如奖励或惩罚）。 2. **智能体模块**：这部分代码包含了强化学习算法的实现，可能包括Q-learning、Deep Q-Network (DQN)、Proximal Policy Optimization (PPO)等。智能体会根据学习策略对环境状态进行评估并选择动作。 3. **训练与评估模块**：代码可能包含用于训练智能体的循环，以及评估其性能的测试集。训练过程会涉及经验回放（Experience Replay）和目标网络更新等技术，以提高学习效率和稳定性。 4. **配置文件**：压缩包可能包含了SUMO的网络定义XML文件，描述了交通网络的布局、信号灯规则等，以及智能体的行为参数。 5. **脚本和工具**：启动和运行模拟实验的脚本，可能还包括数据可视化和结果分析工具。 6. **数据结构和模型保存**：训练好的智能体模型和相关数据可能会被保存，以便于后续使用或进一步研究。通过这个项目，学习者可以实际操作强化学习在交通管理中的应用，了解如何将机器学习技术应用于解决现实问题。同时，由于代码开源，这为研究者提供了宝贵的参考资源，他们可以根据自己的需求调整模型和算法，探索更多可能性。总结来说，sumo_reinforcement_learning项目提供了一个在SUMO交通模拟环境下实践强化学习的平台，有助于理解如何将理论知识转化为实际应用，同时也为交通管理和自动驾驶领域的研究和开发提供了有力的工具。通过这个项目，开发者和研究者可以深入学习强化学习算法，提升其在复杂环境中的决策能力。

资源详情

资源评论

资源推荐

收起资源包目录

sumo_reinforcement_learning-master.zip （12个子文件）

sumo_reinforcement_learning-master

jdglick_reinforcement_learning_traffic_signal_control.pdf 921KB

plotArrivalFittingFunction.py 2KB

plotHist.py 219B

parseOutput.py 442B

LICENSE 34KB

arrivalRateGen.py 4KB

jdglick_reinforcement_learning_poster.pdf 1.34MB

getDiscreteStates.py 22KB

palm.rand.rou.xml 2.05MB

palm.new.net.xml 27KB

palm.sumocfg 567B

controller.py 21KB

Reinforcement Learning For Adaptive Traffic Signal Control With Limited Information

Machine Learning (CS 229) Final Project, Fall 2015

Jeff Glick (jdglick@stanford.edu), M.S. Candidate, Department of Management Science and Engineering, Stanford University

Recent research has effectively applied reinforcement learning to

adaptive traffic signal control problems. Learning agents learn

best with a high level of intelligence about what state the

environment is in to determine the right actions for a given state.

Most researchers have provided this access to perfect state

information. However, in a real-life deployment, this level of

intelligence would require extensive physical instrumentation.

This study explores the possibility of training a control agent that

has only access to information from a limited number of vehicles

using cell phone geo-location data in the interest of comparing

performance against legacy fixed phase timing policy and under

regimes where the agent has access to perfect information.

Motivation

Next Steps

Motivation here

Reinforcement Learning Cycle

• Q-learning develops quality-values 󰇛 󰇜 for each pair 󰇛 󰇜

which is an estimate for the true value 󰇛 󰇜

• Continuous asynchronous updating for on-line learning; assume

infinite visits to states for convergence

• Q-learning update:

           󰇛   

󰆒

󰇛

󰆒

 

󰆒

󰇜󰇜

• States: Discretized state space; number of states for problem:

    󰇟        󰇠

󰇛 󰇜

• Learning Rate : Initially  = 1 ignores previous experience; As  

, we are weighting previous experience more heavily

   



󰇛󰇜

 

















 

• Discount factor : Use    to prevent myopic decision making

• Control policy Given a state  try action  with probability:





 





  









  

(soft max distribution) where  controls exploration; if  is large,

actions are chosen with  equal 



 . As   , policy becomes

deterministic and choose 



󰇛 󰇜 with probability of 1.

• Reward : determined by objective function:

 



 







󰇛 󰇜







 



󰇛 󰇜







Algorithm & Key Parameters

Simulation Build & Data Generation

Simulation Setup:

• Open Street Map and Java

Open Street Map Editor

• Simulation for Urban

Mobility (SUMO)

• Using realistic, variable arrival

and turn rates for a single 8-

phase 4-way intersection

(arrivalGen.py)

Simulation Architecture & Learning Pipeline

map.osm

• Prepared in Java OSM Editor

in.net.xml

• Lanes, Intersections,

Lights, Speed Limits

SUMO NETCONVERT

in.add.xml

• Induction Loops

• Misc. simulation inputs

in.rou.xml

• Vehicle routes & times

arrivalGen.py

• Fit polynomial arrival rate functions to synthetic data

• Generate random vehicle arrival schedule

• Tag selected vehicles if GEOLOCATION is ON (~30%)

palm.sumocfg

• Simulation control file

• Run in SUMO GUI or Command Line

out.queue.xml

• Lane queue sizes at time t

out.fcd.xml

• Vehicle status at time t

out.full.xml

• Full simulation output (lane

throughput, occupancy)

out.ind.xml

• Induction loop counts and

status at time t

parseFull.py

• Validate simulation

• Analyze & visualize output

• Assess performance of learning

algorithms & adjust tuning params

controller.py

• Decide light phase changes

• Collect reward based on

objective function

• Learn optimal policy via Q-

learning

SERVER

CLIENT

Control stop

light via SUMO

Traffic Control

Interface

(TraCI) API

detectState.py

• Maximum likelihood estimate

of non-homogenous arrival

rates, queue sizes & waiting

times

in.set.xml

• SUMO GUI settings

CLIENT

Action: Every s

seconds,

transition to

phase (1) or

continue current

phase (0)

• Validated traffic dynamics; Selected bucket thresholds for

discrete queue sizes and waiting times

• Queues blowing up

• Learning rate  shrinking quickly

• Crude discretization (most of

25k states not being visited)

• Challenges with volatility

• Reward should = change in

objective function (reward

improvement)

Initial Results

Acknowledgements: Michael Bennon (Stanford Global Projects Center), Allen Huang

(CS229 Student), Jesiska Tandy (CS229 Student)

• Throttled learning rate

    system still performing

better during off-peak;

• Some improvement by increasing

bucket thresholds, delaying the

progression of the learning rate

• Increased  (important when

rewards are negative)

• Still performance issues; MDP

assumption may not hold

• Continue to experiment with learning strategy, parameters

and objective function; improve discretization

• Work on state detection problem (limited information); learn

arrival rates or use hour of day in the state space

• Change arrival rate dynamics to test robustness of process

评论收藏

内容反馈

林海靖

粉丝: 79

sumo_reinforcement_learning:与斯坦福大学机器学习课程（CS 229）的最终项目相关的源代码；在SU...

评论0

最新资源

sumo_reinforcement_learning:与斯坦福大学机器学习课程（CS 229）的最终项目相关的源代码； 在SU...

评论0

使用SUMO的交通信号控制强化学习环境。与Gymnasium、PettingZoo和流行的RL库兼容。_MHP-2022.zip

Traffic-Simulator：使用强化学习来模拟交通场景

Traffic-Light-Reinforcement-Learning-using-FLOW-SUMO:该项目旨在通过强化学习来改善交通流量，以培训和观察路网

TrafficSim:交通模拟器机器学习项目

sumo-rl:一个简单的界面，用于通过SUMO实例化强化学习环境，以进行交通信号控制。 与OpenAI的Gym Env和RLlib的MultiAgentEnv兼容

SUMO-交通仿真.rar

智慧交通-车流量检测实现代码+权重文件.zip

SUMO仿真案例（多个）-百度网盘.txt

Sign-Recognition-ML:该项目包含使用机器学习识别交通标志的代码

reinforcement_learning:707深度强化学习课程@伦敦大学城

sumo一个简单的实例代码

基于SUMO的网约车仿真系统设计源码

sumo：Eclipse SUMO是一个开放源代码，高度可移植，微观且连续的流量模拟软件包，旨在处理大型网络。 它允许包括行人在内的多式联运模拟，并带有用于场景创建的大量工具

基于Python的SUMO仿真平台下使用DQN算法优化交通信号灯相位时间的项目源码

Sumo Carla自动驾驶联合仿真：安装配置教程及强化学习在驾驶模拟中的应用，轨迹预测与规划全解析,Sumo Carla自动驾驶联合仿真：安装配置教程及强化学习在驾驶模拟中的应用，轨迹预测与规划全解

highway.zip_SUMO_gift178_highway_sumo车辆_sumo高速

sumo-rl-master.zip

城市交通仿真平台SUMO(V1.3).rar_SUMO_sumo 交通_sumo 交通仿真_sumo仿真教程_交通仿真

SUMO_User_Documentation

sumo

Q-Learning强化学习的代码实现

SUMO

开源项目-SumoLogic-sumoshell.zip

sumo_db_pgsql:用于sumo_db的PostgreSQL适配器

PyPI 官网下载 | sumo_output_parsers-0.5-py3-none-any.whl

veins-4.6.zip_SUMO_omnet sumo_omnet veins_veins_veins omnet

Robo_Sumo_C++_

sumo_db-sql-extras:Sumo DB 的 SQL 特定功能

SUMO.rar_SUMO手册_sumo 交通_sumo交通仿真_sumo仿真官网_交通模型

[UVM]通過RAL Test來驗證APB_WR與APB_RD PORT

88128.NET供求网高效防刷新统计系统

最新资源

sumo_reinforcement_learning:与斯坦福大学机器学习课程（CS 229）的最终项目相关的源代码；在SU...

sumo-rl:一个简单的界面，用于通过SUMO实例化强化学习环境，以进行交通信号控制。与OpenAI的Gym Env和RLlib的MultiAgentEnv兼容

sumo：Eclipse SUMO是一个开放源代码，高度可移植，微观且连续的流量模拟软件包，旨在处理大型网络。它允许包括行人在内的多式联运模拟，并带有用于场景创建的大量工具