文章目录- 1. 目录结构
- 2. nlu.yml
- 3. config.yml
- 4. domain.yml
- 5. 实践
learn from https://github.com/Chinese-NLP-book/rasa_chinese_book_code
1. 目录结构
2. nlu.yml
代码语言:javascript复制version: "3.0"
nlu:
- intent: greet
examples: |
- 你好
- hello
- hi
- 喂
- 在么
- intent: goodbye
examples: |
- 拜拜
- 再见
- 拜
- 退出
- 结束
- intent: medicine
examples: |
- [感冒](disease)了该吃什么药
- 我[便秘](disease)了,该吃什么药
- 我[胃痛](disease),该吃什么药
- 一直[打喷嚏](disease)吃什么药好
- 父母都有[高血压](disease),我应该推荐他们吃什么药好呢
- 头上烫烫的,感觉[发烧](disease)了,该吃什么药好
- [减肥](disease)有什么好的药品推荐吗?
- intent: medical_department
examples: |
- [感冒](disease)了该吃去哪个科室看病
- 我[便秘](disease)了,该去挂哪个科室的号
- 我[胃痛](disease),该去医院看哪个门诊啊
- 一直[打喷嚏](disease)挂哪一个科室的号啊
- [头疼](disease)该挂哪科
- intent: medical_hospital
examples: |
- 我生病了,不知道去哪里看病
- [减肥](disease)有什么好的医院或者健康中心推荐吗?
- 想做个[体检](disease),有哪家医院或者哪里的诊所或者健康中心比较实惠啊?
- 父母都有[高血压](disease),我应该推荐他们去哪家医院好呢
这个配置文件里面有一些 对话的意图
,以及一些 该意图可能的说话例子
3. config.yml
代码语言:javascript复制recipe: default.v1
language: zh
pipeline:
- name: JiebaTokenizer
- name: LanguageModelFeaturizer
model_name: bert
model_weights: bert-base-chinese
- name: "DIETClassifier"
epochs: 100
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
# - name: MemoizationPolicy
# - name: RulePolicy
# - name: UnexpecTEDIntentPolicy
# max_history: 5
# epochs: 100
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
这个文件里配置了:语种,分词器,模型、训练epochs等参数
4. domain.yml
代码语言:javascript复制version: "3.0"
intents:
- greet
- goodbye
- medicine
- medical_department
- medical_hospital
这个文件里面有所有的意图的类别
5. 实践
代码语言:javascript复制pip install --no-deps -r full_requirements.txt
cd Chapter02/
rasa train nlu
训练
rasa train nlu
┌────────────────────────────────────────────────────────────────────────────────┐
│ Rasa Open Source reports anonymous usage telemetry to help improve the product │
│ for all its users. │
│ │
│ If you'd like to opt-out, you can use `rasa telemetry disable`. │
│ To learn more, check out https://rasa.com/docs/rasa/telemetry/telemetry. │
└────────────────────────────────────────────────────────────────────────────────┘
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-07 10:00:26 INFO transformers.file_utils - TensorFlow version 2.6.5 available.
2022-11-07 10:00:27 INFO rasa.engine.training.hooks - Starting to train component 'JiebaTokenizer'.
2022-11-07 10:00:27 INFO rasa.engine.training.hooks - Finished training component 'JiebaTokenizer'.
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.675 seconds.
Prefix dict has been built successfully.
Downloading: 100%|████████████████████████████████████████████████████████████████████████| 110k/110k [00:00<00:00, 250kB/s]
2022-11-07 10:00:30 INFO transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
Downloading: 100%|██████████████████████████████████████████████████████████████████████████| 624/624 [00:00<00:00, 613kB/s]
2022-11-07 10:00:32 INFO transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-07 10:00:32 INFO transformers.configuration_utils - Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21128
}
Downloading: 100%|████████████████████████████████████████████████████████████████████████| 478M/478M [17:05<00:00, 466kB/s]
2022-11-07 10:17:41 INFO transformers.modeling_tf_utils - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-07 10:17:43 INFO transformers.modeling_tf_utils - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/shared/nlu/training_data/features.py:152: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
f_as_text = self.features.tostring()
2022-11-07 10:17:44 INFO rasa.engine.training.hooks - Starting to train component 'DIETClassifier'.
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
rasa.shared.utils.io.raise_warning(
Epochs: 100%|██████████████████████████████████████████████| 100/100 [00:31<00:00, 3.15it/s, t_loss=0.458, i_acc=1, e_f1=1]
2022-11-07 10:18:16 INFO rasa.engine.training.hooks - Finished training component 'DIETClassifier'.
Your Rasa model is trained and saved at 'models/nlu-20221107-100026-rainy-gazebo.tar.gz'.
模型被保存了
代码语言:javascript复制ll models/
total 39536
-rw-rw-r-- 1 web web 20238663 Nov 7 10:18 nlu-20221107-100026-rainy-gazebo.tar.gz
-rw-rw-r-- 1 web web 20238659 Nov 10 09:55 nlu-20221110-095458-green-trill.tar.gz
- 运行测试
rasa shell nlu
rasa shell nlu
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-10 09:57:48 INFO rasa.core.processor - Loading model models/nlu-20221110-095458-green-trill.tar.gz...
2022-11-10 09:57:50 INFO transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
2022-11-10 09:57:50 INFO transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-10 09:57:50 INFO transformers.configuration_utils - Model config BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21128
}
2022-11-10 09:57:52 INFO transformers.modeling_tf_utils - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-10 09:57:57 INFO transformers.modeling_tf_utils - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
rasa.shared.utils.io.raise_warning(
NLU model loaded. Type a message and press enter to parse it.
Next message:
测试1:
代码语言:javascript复制Next message:
我有点感冒,吃什么药好呢?
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.585 seconds.
Prefix dict has been built successfully.
{
"text": "我有点感冒,吃什么药好呢?",
"intent": {
"name": "medicine",
"confidence": 0.9998257756233215
},
"entities": [
{
"entity": "disease",
"start": 3,
"end": 5,
"confidence_entity": 0.9954996705055237,
"value": "感冒",
"extractor": "DIETClassifier"
}
],
"text_tokens": [
[
0,
1
],
[
1,
3
],
[
3,
5
],
[
5,
6
],
[
6,
7
],
[
7,
9
],
[
9,
11
],
[
11,
12
],
[
12,
13
]
],
"intent_ranking": [
{
"name": "medicine",
"confidence": 0.9998257756233215
},
{
"name": "medical_department",
"confidence": 0.0001144336347351782
},
{
"name": "medical_hospital",
"confidence": 2.84777233900968e-05
},
{
"name": "goodbye",
"confidence": 2.1356245269998908e-05
},
{
"name": "greet",
"confidence": 9.92826244328171e-06
}
]
}
代码语言:javascript复制Next message:
我有点晕,该看什么医生?
{
"text": "我有点晕,该看什么医生?",
"intent": {
"name": "medicine",
"confidence": 0.7516889572143555
},
"entities": [],
"text_tokens": [
[
0,
1
],
[
1,
3
],
[
3,
4
],
[
4,
5
],
[
5,
6
],
[
6,
7
],
[
7,
9
],
[
9,
11
],
[
11,
12
]
],
"intent_ranking": [
{
"name": "medicine",
"confidence": 0.7516889572143555
},
{
"name": "medical_department",
"confidence": 0.23077963292598724
},
{
"name": "medical_hospital",
"confidence": 0.014110525138676167
},
{
"name": "goodbye",
"confidence": 0.0021244173403829336
},
{
"name": "greet",
"confidence": 0.0012964475899934769
}
]
}
代码语言:javascript复制Next message:
早上好
{
"text": "早上好",
"intent": {
"name": "greet",
"confidence": 0.9996402263641357
},
"entities": [],
"text_tokens": [
[
0,
3
]
],
"intent_ranking": [
{
"name": "greet",
"confidence": 0.9996402263641357
},
{
"name": "medical_department",
"confidence": 0.00014932868361938745
},
{
"name": "goodbye",
"confidence": 0.00014898570952937007
},
{
"name": "medical_hospital",
"confidence": 5.417354987002909e-05
},
{
"name": "medicine",
"confidence": 7.281645139300963e-06
}
]
}
代码语言:javascript复制Next message:
人民医院在哪里
{
"text": "人民医院在哪里",
"intent": {
"name": "medical_hospital",
"confidence": 0.541263997554779
},
"entities": [],
"text_tokens": [
[
0,
2
],
[
2,
4
],
[
4,
5
],
[
5,
7
]
],
"intent_ranking": [
{
"name": "medical_hospital",
"confidence": 0.541263997554779
},
{
"name": "medical_department",
"confidence": 0.2764747440814972
},
{
"name": "greet",
"confidence": 0.16937503218650818
},
{
"name": "goodbye",
"confidence": 0.011964843608438969
},
{
"name": "medicine",
"confidence": 0.0009213921148329973
}
]
}
- 稍微添加点 nlu.yml,加了些赞美的例子
version: "3.0"
nlu:
- intent: praise
examples: |
- 你真有才华
- 你真帅气
- 你好棒啊
重新训练 rasa train nlu
测试 rasa shell nlu
Next message:
你很优雅的完成了任务
{
"text": "你很优雅的完成了任务",
"intent": {
"name": "praise",
"confidence": 0.3122938573360443
},
"entities": [],
"text_tokens": [
[
0,
1
],
[
1,
2
],
[
2,
4
],
[
4,
5
],
[
5,
7
],
[
7,
8
],
[
8,
10
]
],
"intent_ranking": [
{
"name": "praise",
"confidence": 0.3122938573360443
},
{
"name": "medical_hospital",
"confidence": 0.24623937904834747
},
{
"name": "goodbye",
"confidence": 0.20737841725349426
},
{
"name": "medicine",
"confidence": 0.19506700336933136
},
{
"name": "medical_department",
"confidence": 0.020120976492762566
},
{
"name": "greet",
"confidence": 0.018900321796536446
}
]
}