spacy 报错 gold.pyx in spacy.gold.GoldParse.__init__() 解决方案

2021-02-19 10:57:00 浏览数 (2)

在使用 spacy 进行 NLP 时出现以下错误:

代码语言:javascript复制
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-164-8ef00790b0bb> in <module>
      2 opt = nlp.begin_training()
      3 for i in range(n):
----> 4     loss = train(nlp, train_data, opt)
      5     acc = evaluate(nlp, valid_text, valid_label)
      6     print(f"Loss: {loss['textcat']:.3f} t Accuracy: {accuracy:.3f}")

<ipython-input-155-47db869d5b7c> in train(model, train, optimizer, batch_size)
      8     for batch in batches:
      9         text, label = zip(*batch)
---> 10         model.update(text, label, sgd=optimizer, losses=loss)
     11     return loss

~AppDataRoamingPythonPython37site-packagesspacylanguage.py in update(self, docs, golds, drop, sgd, losses, component_cfg)
    508             sgd = self._optimizer
    509         # Allow dict of args to GoldParse, instead of GoldParse objects.
--> 510         docs, golds = self._format_docs_and_golds(docs, golds)
    511         grads = {}
    512 

~AppDataRoamingPythonPython37site-packagesspacylanguage.py in _format_docs_and_golds(self, docs, golds)
    480                     err = Errors.E151.format(unexp=unexpected, exp=expected_keys)
    481                     raise ValueError(err)
--> 482                 gold = GoldParse(doc, **gold)
    483             doc_objs.append(doc)
    484             gold_objs.append(gold)

gold.pyx in spacy.gold.GoldParse.__init__()

TypeError: object of type 'float' has no len()

原因:

数据中有 NaN,需要处理它

解决方法:

  • 直接丢弃,train = train.dropna()
  • 替换为空字符串,train = train.fillna(" ")

0 人点赞