问题描述
对Rasa官方课程和RasaHQ/rasa-masterclass的episode8进行解析
任务目标:构建找医疗机构聊天机器人
具体功能:
- 选择医疗机构类型,硬编码:医院、疗养院、家庭健康机构
- 指定城市或邮编
- 提供三个供选择(调用API)
- 给具体地址和邮编(调用API)
安装
pip install rasa-x -i https://pypi.rasa.com/simple
PS:最新版Rasa用到TensorFlow2,谨慎安装
配置 config.yml
自然语言处理
聊天策略
不指定优先级的话,聊天策略的先后顺序是无关的,默认优先级如下(1最优先):
FormPolicy
FallbackPolicy
,TwoStageFallbackPolicy
MemoizationPolicy
,AugmentedMemoizationPolicy
MappingPolicy
TEDPolicy
,EmbeddingPolicy
,KerasPolicy
,SklearnPolicy
config.yml
language: en # 英文
pipeline:
- name: WhitespaceTokenizer # 空格分词器
- name: RegexFeaturizer # 正则表达式提取特征
- name: CRFEntityExtractor # 条件随机场提取实体
- name: EntitySynonymMapper # 同义词匹配实体
- name: CountVectorsFeaturizer # 将用户消息、意图和响应用词袋模型表示
- name: EmbeddingIntentClassifier # 嵌入意图分类器
policies:
- name: FormPolicy # 表单策略,用于填充所需槽位
- name: TwoStageFallbackPolicy # 二阶回退策略,不直接回退而是让用户选,尝试消除输入歧义
- name: MemoizationPolicy # 记忆策略,训练数据有的话以置信度1.0预测下一个动作
- name: MappingPolicy # 映射策略,直接将意图映射到动作
- name: EmbeddingPolicy # 嵌入策略,结合意图、实体、槽位、动作等学习模式并预测
训练数据
基本语料 nlu.md
主要意图:
- greet 打招呼
- inform 回复
- search_provider 寻找供应商
- thanks 道谢
- goodbye 再见
截选部分nlu.md
## intent:affirm
- ok
- yes
## intent:deny
- no
- never
## intent:out_of_scope
- what?
- again?
- who is your favourite robot?
- can you help me to build a bot
- show me a picture of a chicken
## intent:goodbye
- bye
- see you
- goodbye
## intent:greet
- hi
- hey
- hello
- good morning
## intent:inform
- [hospital](facility_type)
- [nursing home](facility_type)
- [home health agency](facility_type)
- a [hospital](facility_type)
- a [nursing home](facility_type)
- a [home health agency](facility_type)
- [New York](location)
- [San Francisco](location)
## intent:search_provider
- [hospital](facility_type)
- [nursing home](facility_type)
- [home health agency](facility_type)
- i need a [hospital](facility_type)
- i need a [home health agency](facility_type)
- i need a [nursing home](facility_type) in [Katy](location)
- i need a [hospital](facility_type) my zip code is [77494](location)
- find me a nearby [hospital](facility_type)
## intent:thanks
- thanks
- thank you
## intent:mood_great
- great
- very good
## intent:mood_unhappy
- so sad
- unhappy
## regex:location
- [0-9]{5}
## synonym:xubh-q36u
- hospital
- hospitals
## synonym:b27b-2uc7
- nursing home
- nursing homes
## synonym:9wzi-peqs
- home health agency
- home health agencies
使用medicare.gov数据库来查找关于3种不同医疗机构的信息,给定城市名称、邮政编码或机构ID,每种机构类型的标识符由medicare数据库给出:
- 医院:xubh-q36u
- 疗养院:b27b-2uc7
- 家庭健康机构:9wzi-peqs
对话流程 stories.md
主要动作:
- find_facility_types 查找机构类型
- facility_form 表单选择机构类型
- find_healthcare_address 查找机构地址
stories.md
## happy_path <!-- 主路径 -->
* greet
- find_facility_types <!-- 查找机构类型 -->
* inform{"facility_type": "xubh-q36u"}
- facility_form <!-- 表单选择机构类型 -->
- form{"name": "facility_form"}
- form{"name": null}
* inform{"facility_id": 4245}
- find_healthcare_address <!-- 查找机构地址 -->
- utter_address <!-- 回复地址 -->
* thanks
- utter_noworries
## happy_path_multi_requests <!-- 主路径带多需求 -->
* greet
- find_facility_types
* inform{"facility_type": "xubh-q36u"}
- facility_form
- form{"name": "facility_form"}
- form{"name": null}
* inform{"facility_id": "747604"}
- find_healthcare_address
- utter_address
* search_provider{"facility_type": "xubh-q36u"} <!-- 意图为查找供应商 -->
- facility_form
- form{"name": "facility_form"}
- form{"name": null}
* inform{"facility_id": 4245}
- find_healthcare_address
- utter_address
## happy_path2 <!-- 主路径2 -->
* search_provider{"location": "Austin", "facility_type": "xubh-q36u"} <!-- 意图为查找供应商,且给定城市 -->
- facility_form
- form{"name": "facility_form"}
- form{"name": null}
* inform{"facility_id": "450871"}
- find_healthcare_address
- utter_address
* thanks
- utter_noworries
## story_goodbye
* goodbye
- utter_goodbye
## story_thankyou
* thanks
- utter_noworries
定义域 domain.yml
定义了聊天机器人的整个小宇宙,具体有:意图、实体、槽位、动作、响应、表单等
domain.yml
intents:
- affirm
- deny
- out_of_scope
- goodbye
- greet
- inform
- search_provider
- thanks
- mood_great
- mood_unhappy
entities:
- facility_type
- facility_id
- location
slots:
facility_address:
type: unfeaturized
facility_id:
type: unfeaturized
facility_type:
type: unfeaturized
location:
type: unfeaturized
responses:
utter_greet:
- text: Hi. What are you looking for?
- text: 'Hey there! Please choose one of the healthcare facility options:'
- text: Hello! What can I help you find today?
utter_goodbye:
- text: Talk to you later!
- text: Have a good day.
- text: Until next time!
utter_noworries:
- text: My pleasure.
- text: You are welcome!
utter_ask_facility_type:
- text: 'Choose one of the following to search for: hospital, nursing home, or home health agency.'
utter_ask_location:
- text: Please provide your city name.
- text: What is your current city?
- text: Please provide your city name or zip code.
- text: Please enter your zip code or city name to find local providers.
utter_address:
- text: The address is {facility_address}.
actions:
- utter_noworries
- utter_greet
- utter_goodbye
- utter_ask_location
- utter_ask_facility_type
- find_facility_types
- find_healthcare_address
- utter_address
forms:
- facility_form
session_config:
session_expiration_time: 0
carry_over_slots_to_new_session: true
动作 actions.py
Rasa有四种动作:
- 说话动作:以
utter_
开头,发送特定信息 - 检索动作:以
respond_
开头,发送由检索模型选择的消息 - 自定义动作:运行代码发送任意数量的消息
- 默认动作:如
action_listen
,action_restart
,action_default_fallback
actions.py
import requests
from typing import Dict, Text, Any, List
from rasa_sdk import Action, Tracker
from rasa_sdk.events import SlotSet
from rasa_sdk.forms import FormAction
from rasa_sdk.executor import CollectingDispatcher
# 使用medicare.gov数据库来查找关于3种不同医疗机构的信息,给定城市名称、邮政编码或机构ID,每种机构类型的标识符由medicare数据库给出:
# 医院:xubh-q36u
# 疗养院:b27b-2uc7
# 家庭健康机构:9wzi-peqs
ENDPOINTS = {
"base": "https://data.medicare.gov/resource/{}.json",
"xubh-q36u": {
"city_query": "?city={}",
"zip_code_query": "?zip_code={}",
"id_query": "?provider_id={}"
},
"b27b-2uc7": {
"city_query": "?provider_city={}",
"zip_code_query": "?provider_zip_code={}",
"id_query": "?federal_provider_number={}"
},
"9wzi-peqs": {
"city_query": "?city={}",
"zip_code_query": "?zip={}",
"id_query": "?provider_number={}"
}
}
FACILITY_TYPES = {
"hospital": {
"name": "hospital",
"resource": "xubh-q36u"
},
"nursing_home": {
"name": "nursing home",
"resource": "b27b-2uc7"
},
"home_health": {
"name": "home health agency",
"resource": "9wzi-peqs"
}
}
def _create_path(base: Text, resource: Text, query: Text, values: Text) -> Text:
"""构建API接口访问路径
>>> _create_path(base=ENDPOINTS["base"], resource="xubh-q36u", query=ENDPOINTS["xubh-q36u"]["city_query"], values="New York".upper())
'https://data.medicare.gov/resource/xubh-q36u.json?city=NEW YORK'
"""
if isinstance(values, list):
return (base + query).format(resource, ', '.join('"{0}"'.format(w) for w in values))
else:
return (base + query).format(resource, values)
def _find_facilities(location: Text, resource: Text) -> List[Dict]:
"""调用API返回符合条件的机构JSON信息
#>>> _find_facilities(location="New York", resource="xubh-q36u")
"""
if str.isdigit(location):
full_path = _create_path(ENDPOINTS["base"], resource, ENDPOINTS[resource]["zip_code_query"], location)
else:
full_path = _create_path(ENDPOINTS["base"], resource, ENDPOINTS[resource]["city_query"], location.upper())
results = requests.get(full_path).json()
return results
def _resolve_name(facility_types, resource) -> Text:
"""解析机构类型名
>>> _resolve_name(facility_types=FACILITY_TYPES, resource="xubh-q36u")
'hospital'
"""
for key, value in facility_types.items():
if value.get("resource") == resource:
return value.get("name")
return ""
class FindFacilityTypes(Action):
"""自定义动作,提供选择按钮填充facility_type槽位"""
def name(self) -> Text:
"""自定义动作名"""
return "find_facility_types"
def run(self,
dispatcher: CollectingDispatcher,
tracker: Tracker,
domain: Dict[Text, Any]) -> List:
"""提供选择按钮"""
buttons = []
for t in FACILITY_TYPES:
facility_type = FACILITY_TYPES[t]
payload = "/inform{\"facility_type\": \"" + facility_type.get("resource") + "\"}" # 数据传输的实际信息
buttons.append({"title": "{}".format(facility_type.get("name").title()), "payload": payload})
dispatcher.utter_button_template("utter_greet", buttons, tracker)
return []
class FindHealthCareAddress(Action):
"""自定义动作,检索用户选择的医疗保健设施的地址"""
def name(self) -> Text:
"""自定义动作名"""
return "find_healthcare_address"
def run(self,
dispatcher: CollectingDispatcher,
tracker: Tracker,
domain: Dict[Text, Any]) -> List[Dict]:
""""""
facility_type = tracker.get_slot("facility_type") # 取出槽值
facility_id = tracker.get_slot("facility_id")
full_path = _create_path(ENDPOINTS["base"], facility_type, ENDPOINTS[facility_type]["id_query"], facility_id)
results = requests.get(full_path).json()
if results:
selected = results[0]
if facility_type == FACILITY_TYPES["hospital"]["resource"]:
address = "{}, {}, {} {}".format(selected["address"].title(),
selected["city"].title(),
selected["state"].upper(),
selected["zip_code"].title())
elif facility_type == FACILITY_TYPES["nursing_home"]["resource"]:
address = "{}, {}, {} {}".format(selected["provider_address"].title(),
selected["provider_city"].title(),
selected["provider_state"].upper(),
selected["provider_zip_code"].title())
else:
address = "{}, {}, {} {}".format(selected["address"].title(),
selected["city"].title(),
selected["state"].upper(),
selected["zip"].title())
return [SlotSet("facility_address", address)] # 返回带槽位的信息,用于填充回复模板
else:
print("No address found. Most likely this action was executed "
"before the user choose a healthcare facility from the "
"provided list. "
"If this is a common problem in your dialogue flow,"
"using a form instead for this action might be appropriate.")
return [SlotSet("facility_address", "not found")]
class FacilityForm(FormAction):
"""自定义表单动作,填充所需的所有插槽"""
def name(self) -> Text:
"""自定义动作名"""
return "facility_form"
@staticmethod
def required_slots(tracker: Tracker) -> List[Text]:
"""所需的所有槽位"""
return ["facility_type", "location"]
def slot_mappings(self) -> Dict[Text, Any]:
"""映射所需槽位的字典"""
return {
"facility_type": self.from_entity(entity="facility_type", intent=["inform", "search_provider"]),
"location": self.from_entity(entity="location", intent=["inform", "search_provider"]) # 从指定intent中提取槽值
}
def submit(self,
dispatcher: CollectingDispatcher,
tracker: Tracker,
domain: Dict[Text, Any]
) -> List[Dict]:
"""一旦所需槽位被填满,打印医疗设施的按钮"""
location = tracker.get_slot('location') # 取出槽值
facility_type = tracker.get_slot('facility_type')
results = _find_facilities(location, facility_type) # 调用API返回符合条件的机构
button_name = _resolve_name(FACILITY_TYPES, facility_type) # 解析机构类型名
if len(results) == 0:
dispatcher.utter_message("Sorry, we could not find a {} in {}.".format(button_name, location.title()))
return []
buttons = []
# 返回3个
for r in results[:3]:
if facility_type == FACILITY_TYPES["hospital"]["resource"]:
facility_id = r.get("provider_id")
name = r["hospital_name"]
elif facility_type == FACILITY_TYPES["nursing_home"]["resource"]:
facility_id = r["federal_provider_number"]
name = r["provider_name"]
else:
facility_id = r["provider_number"]
name = r["provider_name"]
payload = "/inform{\"facility_id\":\"" + facility_id + "\"}"
buttons.append({"title": "{}".format(name.title()), "payload": payload})
if len(buttons) == 1:
message = "Here is a {} near you:".format(button_name)
else:
if button_name == "home health agency":
button_name = "home health agencie"
message = "Here are {} {}s near you:".format(len(buttons), button_name)
dispatcher.utter_button_message(message, buttons)
return []
运行自定义动作需要指定endpoints.yml
endpoints.yml
action_endpoint:
url: http://localhost:5055/webhook
启动自定义动作 rasa run actions