在使用 spacy 进行 NLP 时出现以下错误:
代码语言:javascript复制---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-164-8ef00790b0bb> in <module>
2 opt = nlp.begin_training()
3 for i in range(n):
----> 4 loss = train(nlp, train_data, opt)
5 acc = evaluate(nlp, valid_text, valid_label)
6 print(f"Loss: {loss['textcat']:.3f} t Accuracy: {accuracy:.3f}")
<ipython-input-155-47db869d5b7c> in train(model, train, optimizer, batch_size)
8 for batch in batches:
9 text, label = zip(*batch)
---> 10 model.update(text, label, sgd=optimizer, losses=loss)
11 return loss
~AppDataRoamingPythonPython37site-packagesspacylanguage.py in update(self, docs, golds, drop, sgd, losses, component_cfg)
508 sgd = self._optimizer
509 # Allow dict of args to GoldParse, instead of GoldParse objects.
--> 510 docs, golds = self._format_docs_and_golds(docs, golds)
511 grads = {}
512
~AppDataRoamingPythonPython37site-packagesspacylanguage.py in _format_docs_and_golds(self, docs, golds)
480 err = Errors.E151.format(unexp=unexpected, exp=expected_keys)
481 raise ValueError(err)
--> 482 gold = GoldParse(doc, **gold)
483 doc_objs.append(doc)
484 gold_objs.append(gold)
gold.pyx in spacy.gold.GoldParse.__init__()
TypeError: object of type 'float' has no len()
原因:
数据中有 NaN
,需要处理它
解决方法:
- 直接丢弃,
train = train.dropna()
- 替换为空字符串,
train = train.fillna(" ")