Rasa NLU 实践

2022-11-18 15:40:00 浏览数 (1)

文章目录
  • 1. 目录结构
  • 2. nlu.yml
  • 3. config.yml
  • 4. domain.yml
  • 5. 实践

learn from https://github.com/Chinese-NLP-book/rasa_chinese_book_code

1. 目录结构

2. nlu.yml

代码语言:javascript复制
version: "3.0"
nlu:
  - intent: greet
    examples: |
      - 你好
      - hello
      - hi
      - 喂
      - 在么
  - intent: goodbye
    examples: |
      - 拜拜
      - 再见
      - 拜
      - 退出
      - 结束
  - intent: medicine
    examples: |
      - [感冒](disease)了该吃什么药
      - 我[便秘](disease)了,该吃什么药
      - 我[胃痛](disease),该吃什么药
      - 一直[打喷嚏](disease)吃什么药好
      - 父母都有[高血压](disease),我应该推荐他们吃什么药好呢
      - 头上烫烫的,感觉[发烧](disease)了,该吃什么药好
      - [减肥](disease)有什么好的药品推荐吗?
  - intent: medical_department
    examples: |
      - [感冒](disease)了该吃去哪个科室看病
      - 我[便秘](disease)了,该去挂哪个科室的号
      - 我[胃痛](disease),该去医院看哪个门诊啊
      - 一直[打喷嚏](disease)挂哪一个科室的号啊
      - [头疼](disease)该挂哪科
  - intent: medical_hospital
    examples: |
      - 我生病了,不知道去哪里看病
      - [减肥](disease)有什么好的医院或者健康中心推荐吗?
      - 想做个[体检](disease),有哪家医院或者哪里的诊所或者健康中心比较实惠啊?
      - 父母都有[高血压](disease),我应该推荐他们去哪家医院好呢

这个配置文件里面有一些 对话的意图,以及一些 该意图可能的说话例子

3. config.yml

代码语言:javascript复制
recipe: default.v1

language: zh

pipeline:
  - name: JiebaTokenizer
  - name: LanguageModelFeaturizer
    model_name: bert
    model_weights: bert-base-chinese
  - name: "DIETClassifier"
    epochs: 100

policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: RulePolicy
#   - name: UnexpecTEDIntentPolicy
#     max_history: 5
#     epochs: 100
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true

这个文件里配置了:语种,分词器,模型、训练epochs等参数

4. domain.yml

代码语言:javascript复制
version: "3.0"

intents:
  - greet
  - goodbye
  - medicine
  - medical_department
  - medical_hospital

这个文件里面有所有的意图的类别

5. 实践

代码语言:javascript复制
pip install --no-deps -r full_requirements.txt
cd Chapter02/
  • rasa train nlu 训练
代码语言:javascript复制
rasa train nlu
┌────────────────────────────────────────────────────────────────────────────────┐
│ Rasa Open Source reports anonymous usage telemetry to help improve the product │
│ for all its users.                                                             │
│                                                                                │
│ If you'd like to opt-out, you can use `rasa telemetry disable`.                │
│ To learn more, check out https://rasa.com/docs/rasa/telemetry/telemetry.       │
└────────────────────────────────────────────────────────────────────────────────┘
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-07 10:00:26 INFO     transformers.file_utils  - TensorFlow version 2.6.5 available.
2022-11-07 10:00:27 INFO     rasa.engine.training.hooks  - Starting to train component 'JiebaTokenizer'.
2022-11-07 10:00:27 INFO     rasa.engine.training.hooks  - Finished training component 'JiebaTokenizer'.
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.675 seconds.
Prefix dict has been built successfully.
Downloading: 100%|████████████████████████████████████████████████████████████████████████| 110k/110k [00:00<00:00, 250kB/s]
2022-11-07 10:00:30 INFO     transformers.tokenization_utils  - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
Downloading: 100%|██████████████████████████████████████████████████████████████████████████| 624/624 [00:00<00:00, 613kB/s]
2022-11-07 10:00:32 INFO     transformers.configuration_utils  - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-07 10:00:32 INFO     transformers.configuration_utils  - Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}

Downloading: 100%|████████████████████████████████████████████████████████████████████████| 478M/478M [17:05<00:00, 466kB/s]
2022-11-07 10:17:41 INFO     transformers.modeling_tf_utils  - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-07 10:17:43 INFO     transformers.modeling_tf_utils  - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/shared/nlu/training_data/features.py:152: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  f_as_text = self.features.tostring()
2022-11-07 10:17:44 INFO     rasa.engine.training.hooks  - Starting to train component 'DIETClassifier'.
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
  rasa.shared.utils.io.raise_warning(
Epochs: 100%|██████████████████████████████████████████████| 100/100 [00:31<00:00,  3.15it/s, t_loss=0.458, i_acc=1, e_f1=1]
2022-11-07 10:18:16 INFO     rasa.engine.training.hooks  - Finished training component 'DIETClassifier'.
Your Rasa model is trained and saved at 'models/nlu-20221107-100026-rainy-gazebo.tar.gz'.

模型被保存了

代码语言:javascript复制
ll models/
total 39536
-rw-rw-r-- 1 web web 20238663 Nov  7 10:18 nlu-20221107-100026-rainy-gazebo.tar.gz
-rw-rw-r-- 1 web web 20238659 Nov 10 09:55 nlu-20221110-095458-green-trill.tar.gz
  • 运行测试 rasa shell nlu
代码语言:javascript复制
rasa shell nlu
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-10 09:57:48 INFO     rasa.core.processor  - Loading model models/nlu-20221110-095458-green-trill.tar.gz...
2022-11-10 09:57:50 INFO     transformers.tokenization_utils  - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
2022-11-10 09:57:50 INFO     transformers.configuration_utils  - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-10 09:57:50 INFO     transformers.configuration_utils  - Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}

2022-11-10 09:57:52 INFO     transformers.modeling_tf_utils  - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-10 09:57:57 INFO     transformers.modeling_tf_utils  - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']

/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
  rasa.shared.utils.io.raise_warning(
NLU model loaded. Type a message and press enter to parse it.
Next message:

测试1:

代码语言:javascript复制
Next message:
我有点感冒,吃什么药好呢?
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.585 seconds.
Prefix dict has been built successfully.
{
  "text": "我有点感冒,吃什么药好呢?",
  "intent": {
    "name": "medicine",
    "confidence": 0.9998257756233215
  },
  "entities": [
    {
      "entity": "disease",
      "start": 3,
      "end": 5,
      "confidence_entity": 0.9954996705055237,
      "value": "感冒",
      "extractor": "DIETClassifier"
    }
  ],
  "text_tokens": [
    [
      0,
      1
    ],
    [
      1,
      3
    ],
    [
      3,
      5
    ],
    [
      5,
      6
    ],
    [
      6,
      7
    ],
    [
      7,
      9
    ],
    [
      9,
      11
    ],
    [
      11,
      12
    ],
    [
      12,
      13
    ]
  ],
  "intent_ranking": [
    {
      "name": "medicine",
      "confidence": 0.9998257756233215
    },
    {
      "name": "medical_department",
      "confidence": 0.0001144336347351782
    },
    {
      "name": "medical_hospital",
      "confidence": 2.84777233900968e-05
    },
    {
      "name": "goodbye",
      "confidence": 2.1356245269998908e-05
    },
    {
      "name": "greet",
      "confidence": 9.92826244328171e-06
    }
  ]
}
代码语言:javascript复制
Next message:
我有点晕,该看什么医生?
{
  "text": "我有点晕,该看什么医生?",
  "intent": {
    "name": "medicine",
    "confidence": 0.7516889572143555
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      1
    ],
    [
      1,
      3
    ],
    [
      3,
      4
    ],
    [
      4,
      5
    ],
    [
      5,
      6
    ],
    [
      6,
      7
    ],
    [
      7,
      9
    ],
    [
      9,
      11
    ],
    [
      11,
      12
    ]
  ],
  "intent_ranking": [
    {
      "name": "medicine",
      "confidence": 0.7516889572143555
    },
    {
      "name": "medical_department",
      "confidence": 0.23077963292598724
    },
    {
      "name": "medical_hospital",
      "confidence": 0.014110525138676167
    },
    {
      "name": "goodbye",
      "confidence": 0.0021244173403829336
    },
    {
      "name": "greet",
      "confidence": 0.0012964475899934769
    }
  ]
}
代码语言:javascript复制
Next message:
早上好
{
  "text": "早上好",
  "intent": {
    "name": "greet",
    "confidence": 0.9996402263641357
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      3
    ]
  ],
  "intent_ranking": [
    {
      "name": "greet",
      "confidence": 0.9996402263641357
    },
    {
      "name": "medical_department",
      "confidence": 0.00014932868361938745
    },
    {
      "name": "goodbye",
      "confidence": 0.00014898570952937007
    },
    {
      "name": "medical_hospital",
      "confidence": 5.417354987002909e-05
    },
    {
      "name": "medicine",
      "confidence": 7.281645139300963e-06
    }
  ]
}
代码语言:javascript复制
Next message:
人民医院在哪里
{
  "text": "人民医院在哪里",
  "intent": {
    "name": "medical_hospital",
    "confidence": 0.541263997554779
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      2
    ],
    [
      2,
      4
    ],
    [
      4,
      5
    ],
    [
      5,
      7
    ]
  ],
  "intent_ranking": [
    {
      "name": "medical_hospital",
      "confidence": 0.541263997554779
    },
    {
      "name": "medical_department",
      "confidence": 0.2764747440814972
    },
    {
      "name": "greet",
      "confidence": 0.16937503218650818
    },
    {
      "name": "goodbye",
      "confidence": 0.011964843608438969
    },
    {
      "name": "medicine",
      "confidence": 0.0009213921148329973
    }
  ]
}
  • 稍微添加点 nlu.yml,加了些赞美的例子
代码语言:javascript复制
version: "3.0"
nlu:
  - intent: praise
    examples: |
      - 你真有才华
      - 你真帅气
      - 你好棒啊

重新训练 rasa train nlu 测试 rasa shell nlu

代码语言:javascript复制
Next message:
你很优雅的完成了任务
{
  "text": "你很优雅的完成了任务",
  "intent": {
    "name": "praise",
    "confidence": 0.3122938573360443
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      1
    ],
    [
      1,
      2
    ],
    [
      2,
      4
    ],
    [
      4,
      5
    ],
    [
      5,
      7
    ],
    [
      7,
      8
    ],
    [
      8,
      10
    ]
  ],
  "intent_ranking": [
    {
      "name": "praise",
      "confidence": 0.3122938573360443
    },
    {
      "name": "medical_hospital",
      "confidence": 0.24623937904834747
    },
    {
      "name": "goodbye",
      "confidence": 0.20737841725349426
    },
    {
      "name": "medicine",
      "confidence": 0.19506700336933136
    },
    {
      "name": "medical_department",
      "confidence": 0.020120976492762566
    },
    {
      "name": "greet",
      "confidence": 0.018900321796536446
    }
  ]
}

0 人点赞