Rasa NLU 实践_ 字节宝

文章目录
1. 目录结构
2. nlu.yml
3. config.yml
4. domain.yml
5. 实践

learn from https://github.com/Chinese-NLP-book/rasa_chinese_book_code

1. 目录结构

2. nlu.yml

代码语言：javascript复制

version: "3.0"
nlu:
  - intent: greet
    examples: |
      - 你好
      - hello
      - hi
      - 喂
      - 在么
  - intent: goodbye
    examples: |
      - 拜拜
      - 再见
      - 拜
      - 退出
      - 结束
  - intent: medicine
    examples: |
      - [感冒](disease)了该吃什么药
      - 我[便秘](disease)了，该吃什么药
      - 我[胃痛](disease)，该吃什么药
      - 一直[打喷嚏](disease)吃什么药好
      - 父母都有[高血压](disease)，我应该推荐他们吃什么药好呢
      - 头上烫烫的，感觉[发烧](disease)了，该吃什么药好
      - [减肥](disease)有什么好的药品推荐吗？
  - intent: medical_department
    examples: |
      - [感冒](disease)了该吃去哪个科室看病
      - 我[便秘](disease)了，该去挂哪个科室的号
      - 我[胃痛](disease)，该去医院看哪个门诊啊
      - 一直[打喷嚏](disease)挂哪一个科室的号啊
      - [头疼](disease)该挂哪科
  - intent: medical_hospital
    examples: |
      - 我生病了，不知道去哪里看病
      - [减肥](disease)有什么好的医院或者健康中心推荐吗？
      - 想做个[体检](disease)，有哪家医院或者哪里的诊所或者健康中心比较实惠啊？
      - 父母都有[高血压](disease)，我应该推荐他们去哪家医院好呢

这个配置文件里面有一些 对话的意图，以及一些 该意图可能的说话例子

3. config.yml

代码语言：javascript复制

recipe: default.v1

language: zh

pipeline:
  - name: JiebaTokenizer
  - name: LanguageModelFeaturizer
    model_name: bert
    model_weights: bert-base-chinese
  - name: "DIETClassifier"
    epochs: 100

policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: RulePolicy
#   - name: UnexpecTEDIntentPolicy
#     max_history: 5
#     epochs: 100
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true

这个文件里配置了：语种，分词器，模型、训练epochs等参数

4. domain.yml

代码语言：javascript复制

version: "3.0"

intents:
  - greet
  - goodbye
  - medicine
  - medical_department
  - medical_hospital

这个文件里面有所有的意图的类别

5. 实践

代码语言：javascript复制

pip install --no-deps -r full_requirements.txt
cd Chapter02/

rasa train nlu 训练

代码语言：javascript复制

rasa train nlu
┌────────────────────────────────────────────────────────────────────────────────┐
│ Rasa Open Source reports anonymous usage telemetry to help improve the product │
│ for all its users.                                                             │
│                                                                                │
│ If you'd like to opt-out, you can use `rasa telemetry disable`.                │
│ To learn more, check out https://rasa.com/docs/rasa/telemetry/telemetry.       │
└────────────────────────────────────────────────────────────────────────────────┘
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-07 10:00:26 INFO     transformers.file_utils  - TensorFlow version 2.6.5 available.
2022-11-07 10:00:27 INFO     rasa.engine.training.hooks  - Starting to train component 'JiebaTokenizer'.
2022-11-07 10:00:27 INFO     rasa.engine.training.hooks  - Finished training component 'JiebaTokenizer'.
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.675 seconds.
Prefix dict has been built successfully.
Downloading: 100%|████████████████████████████████████████████████████████████████████████| 110k/110k [00:00<00:00, 250kB/s]
2022-11-07 10:00:30 INFO     transformers.tokenization_utils  - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
Downloading: 100%|██████████████████████████████████████████████████████████████████████████| 624/624 [00:00<00:00, 613kB/s]
2022-11-07 10:00:32 INFO     transformers.configuration_utils  - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-07 10:00:32 INFO     transformers.configuration_utils  - Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}

Downloading: 100%|████████████████████████████████████████████████████████████████████████| 478M/478M [17:05<00:00, 466kB/s]
2022-11-07 10:17:41 INFO     transformers.modeling_tf_utils  - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-07 10:17:43 INFO     transformers.modeling_tf_utils  - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/shared/nlu/training_data/features.py:152: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
  f_as_text = self.features.tostring()
2022-11-07 10:17:44 INFO     rasa.engine.training.hooks  - Starting to train component 'DIETClassifier'.
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
  rasa.shared.utils.io.raise_warning(
Epochs: 100%|██████████████████████████████████████████████| 100/100 [00:31<00:00,  3.15it/s, t_loss=0.458, i_acc=1, e_f1=1]
2022-11-07 10:18:16 INFO     rasa.engine.training.hooks  - Finished training component 'DIETClassifier'.
Your Rasa model is trained and saved at 'models/nlu-20221107-100026-rainy-gazebo.tar.gz'.

模型被保存了

代码语言：javascript复制

ll models/
total 39536
-rw-rw-r-- 1 web web 20238663 Nov  7 10:18 nlu-20221107-100026-rainy-gazebo.tar.gz
-rw-rw-r-- 1 web web 20238659 Nov 10 09:55 nlu-20221110-095458-green-trill.tar.gz

运行测试 rasa shell nlu

代码语言：javascript复制

rasa shell nlu
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
  'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
  'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
  'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.
  if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.
  if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.
  if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-10 09:57:48 INFO     rasa.core.processor  - Loading model models/nlu-20221110-095458-green-trill.tar.gz...
2022-11-10 09:57:50 INFO     transformers.tokenization_utils  - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
2022-11-10 09:57:50 INFO     transformers.configuration_utils  - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-10 09:57:50 INFO     transformers.configuration_utils  - Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}

2022-11-10 09:57:52 INFO     transformers.modeling_tf_utils  - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-10 09:57:57 INFO     transformers.modeling_tf_utils  - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']

/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
  rasa.shared.utils.io.raise_warning(
NLU model loaded. Type a message and press enter to parse it.
Next message:

测试1：

代码语言：javascript复制

Next message:
我有点感冒，吃什么药好呢？
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.585 seconds.
Prefix dict has been built successfully.
{
  "text": "我有点感冒，吃什么药好呢？",
  "intent": {
    "name": "medicine",
    "confidence": 0.9998257756233215
  },
  "entities": [
    {
      "entity": "disease",
      "start": 3,
      "end": 5,
      "confidence_entity": 0.9954996705055237,
      "value": "感冒",
      "extractor": "DIETClassifier"
    }
  ],
  "text_tokens": [
    [
      0,
      1
    ],
    [
      1,
      3
    ],
    [
      3,
      5
    ],
    [
      5,
      6
    ],
    [
      6,
      7
    ],
    [
      7,
      9
    ],
    [
      9,
      11
    ],
    [
      11,
      12
    ],
    [
      12,
      13
    ]
  ],
  "intent_ranking": [
    {
      "name": "medicine",
      "confidence": 0.9998257756233215
    },
    {
      "name": "medical_department",
      "confidence": 0.0001144336347351782
    },
    {
      "name": "medical_hospital",
      "confidence": 2.84777233900968e-05
    },
    {
      "name": "goodbye",
      "confidence": 2.1356245269998908e-05
    },
    {
      "name": "greet",
      "confidence": 9.92826244328171e-06
    }
  ]
}

代码语言：javascript复制

Next message:
我有点晕，该看什么医生？
{
  "text": "我有点晕，该看什么医生？",
  "intent": {
    "name": "medicine",
    "confidence": 0.7516889572143555
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      1
    ],
    [
      1,
      3
    ],
    [
      3,
      4
    ],
    [
      4,
      5
    ],
    [
      5,
      6
    ],
    [
      6,
      7
    ],
    [
      7,
      9
    ],
    [
      9,
      11
    ],
    [
      11,
      12
    ]
  ],
  "intent_ranking": [
    {
      "name": "medicine",
      "confidence": 0.7516889572143555
    },
    {
      "name": "medical_department",
      "confidence": 0.23077963292598724
    },
    {
      "name": "medical_hospital",
      "confidence": 0.014110525138676167
    },
    {
      "name": "goodbye",
      "confidence": 0.0021244173403829336
    },
    {
      "name": "greet",
      "confidence": 0.0012964475899934769
    }
  ]
}

代码语言：javascript复制

Next message:
早上好
{
  "text": "早上好",
  "intent": {
    "name": "greet",
    "confidence": 0.9996402263641357
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      3
    ]
  ],
  "intent_ranking": [
    {
      "name": "greet",
      "confidence": 0.9996402263641357
    },
    {
      "name": "medical_department",
      "confidence": 0.00014932868361938745
    },
    {
      "name": "goodbye",
      "confidence": 0.00014898570952937007
    },
    {
      "name": "medical_hospital",
      "confidence": 5.417354987002909e-05
    },
    {
      "name": "medicine",
      "confidence": 7.281645139300963e-06
    }
  ]
}

代码语言：javascript复制

Next message:
人民医院在哪里
{
  "text": "人民医院在哪里",
  "intent": {
    "name": "medical_hospital",
    "confidence": 0.541263997554779
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      2
    ],
    [
      2,
      4
    ],
    [
      4,
      5
    ],
    [
      5,
      7
    ]
  ],
  "intent_ranking": [
    {
      "name": "medical_hospital",
      "confidence": 0.541263997554779
    },
    {
      "name": "medical_department",
      "confidence": 0.2764747440814972
    },
    {
      "name": "greet",
      "confidence": 0.16937503218650818
    },
    {
      "name": "goodbye",
      "confidence": 0.011964843608438969
    },
    {
      "name": "medicine",
      "confidence": 0.0009213921148329973
    }
  ]
}

稍微添加点 nlu.yml，加了些赞美的例子

代码语言：javascript复制

version: "3.0"
nlu:
  - intent: praise
    examples: |
      - 你真有才华
      - 你真帅气
      - 你好棒啊

重新训练 rasa train nlu 测试 rasa shell nlu

代码语言：javascript复制

Next message:
你很优雅的完成了任务
{
  "text": "你很优雅的完成了任务",
  "intent": {
    "name": "praise",
    "confidence": 0.3122938573360443
  },
  "entities": [],
  "text_tokens": [
    [
      0,
      1
    ],
    [
      1,
      2
    ],
    [
      2,
      4
    ],
    [
      4,
      5
    ],
    [
      5,
      7
    ],
    [
      7,
      8
    ],
    [
      8,
      10
    ]
  ],
  "intent_ranking": [
    {
      "name": "praise",
      "confidence": 0.3122938573360443
    },
    {
      "name": "medical_hospital",
      "confidence": 0.24623937904834747
    },
    {
      "name": "goodbye",
      "confidence": 0.20737841725349426
    },
    {
      "name": "medicine",
      "confidence": 0.19506700336933136
    },
    {
      "name": "medical_department",
      "confidence": 0.020120976492762566
    },
    {
      "name": "greet",
      "confidence": 0.018900321796536446
    }
  ]
}

shell 腾讯云测试服务

0 人点赞

Rasa NLU 实践

文章目录1. 目录结构2. nlu.yml3. config.yml4. domain.yml5. 实践

1. 目录结构

2. nlu.yml

3. config.yml

4. domain.yml

5. 实践

文章目录
1. 目录结构
2. nlu.yml
3. config.yml
4. domain.yml
5. 实践