具有泛化能力的句子表征模型:Gensen评测实验

2019-05-26 14:10:24 浏览数 (1)

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning(https://arxiv.org/abs/1804.00079)一文发表在 ICLR 2018,中,该论文提出的模型能够在各种各样的任务中泛化句子表征,且设计了一个一对多的多任务学习框架。其主要贡献的描述如文中所述。

The primary contribution of our work is to combine the benefits of diverse sentence-representation learning objectives into a single multi-task framework. 同时实验表明,在增添了一个多语言神经机器翻译任务时,句法属性能够被更好地学习到,句子长度和词序能够通过一个句法分析任务学习到,并且训练一个神经语言推理能够编码语法信息。

2、实验部分

(1)由于实验复现采用了python3环境,因此对gensen中的相关代码进行了修改,主要是两个部分:

1)修改了glove2h5的部分代码,由于python3脚本对float(val)敏感,需要引入try, exception机制来解决。

代码语言:javascript复制
print('change vectors')
w2v_vector=[]
for line in glove_vectors:
    try:
        w2v_vector =[float(val) for val in line[1:]]
    except ValueError:
        print("error on line",line)
        
        
del glove_vectors
gc.collect()
vectors = np.array(w2v_vector).astype(np.float32)

注:由于在转换过程中需要消耗大量的内存,因此进行了内存回收,不然容易出现memory 不够的问题。

2)vocab文件在python3中的打开存在编码不对应的问题(gensen.py)。

代码语言:javascript复制
def _load_params(self):
        """Load pretrained params."""
        # Read vocab pickle files
        filename = os.path.join(
                self.model_folder,
                '%s_vocab.pkl' % (self.filename_prefix)
            )
        print(filename)
        with open(filename, 'rb') as f:            
            u = pickle._Unpickler(f)
            u.encoding = 'latin1'
            model_vocab = u.load()

(2)实验评测

1)gensen中自带的例子测试

sentences = [ 'hello world .', 'the quick brown fox jumped over the lazy dog .', 'this is a sentence .' ] vocab = [ 'the', 'quick', 'brown', 'fox', 'jumped', 'over', 'lazy', 'dog', 'hello', 'world', '.', 'this', 'is', 'a', 'sentence', '<s>', '</s>', '<pad>', '<unk>' ]

2)senteval中集成的测试

在senteval的17项任务评测结果如下:

{'STS12': {'MSRpar': {'pearson': (0.4242749254520813, 3.973321856075198e-34), 'spearman': SpearmanrResult(correlation=0.43689783218545136, pvalue=2.623847109207459e-36), 'nsamples': 750}, 'MSRvid': {'pearson': (0.8431200046048173, 9.954996055278301e-204), 'spearman': SpearmanrResult(correlation=0.8434445060271232, pvalue=4.899452803862567e-204), 'nsamples': 750}, 'SMTeuroparl': {'pearson': (0.5085791335463655, 1.4543565654958856e-31), 'spearman': SpearmanrResult(correlation=0.5910758372570859, pvalue=1.3966783465806513e-44), 'nsamples': 459}, 'surprise.OnWN': {'pearson': (0.6924773496538905, 3.609130779999958e-108), 'spearman': SpearmanrResult(correlation=0.6831386989584722, pvalue=3.338887773358492e-104), 'nsamples': 750}, 'surprise.SMTnews': {'pearson': (0.5699883750430004, 9.450042347374515e-36), 'spearman': SpearmanrResult(correlation=0.4924898524588661, pvalue=9.093432952648339e-26), 'nsamples': 399}, 'all': {'pearson': {'mean': 0.607687957660031, 'wmean': 0.6212250301554153}, 'spearman': {'mean': 0.6094093453773997, 'wmean': 0.6243301281564914}}}, 'STS13': {'FNWN': {'pearson': (0.44812079835627416, 1.0069228531863214e-10), 'spearman': SpearmanrResult(correlation=0.46648892903400746, pvalue=1.3294443717066755e-11), 'nsamples': 189}, 'headlines': {'pearson': (0.7039260583060535, 3.0407269106065143e-113), 'spearman': SpearmanrResult(correlation=0.6938053503920689, pvalue=9.57272215561543e-109), 'nsamples': 750}, 'OnWN': {'pearson': (0.4673906945667383, 8.638108134571737e-32), 'spearman': SpearmanrResult(correlation=0.4912206669354746, pvalue=2.062109639692091e-35), 'nsamples': 561}, 'all': {'pearson': {'mean': 0.5398125170763554, 'wmean': 0.5832303695138774}, 'spearman': {'mean': 0.550504982120517, 'wmean': 0.5893968096881869}}}, 'STS14': {'deft-forum': {'pearson': (0.3569903021625723, 5.687789477438835e-15), 'spearman': SpearmanrResult(correlation=0.35030097553307676, pvalue=1.9451695508522155e-14), 'nsamples': 450}, 'deft-news': {'pearson': (0.6982006599148092, 3.6759088740801256e-45), 'spearman': SpearmanrResult(correlation=0.6737747591797518, pvalue=4.777495360379787e-41), 'nsamples': 300}, 'headlines': {'pearson': (0.6607384123277869, 2.8622323715880622e-95), 'spearman': SpearmanrResult(correlation=0.6358071122283795, pvalue=3.423245972757599e-86), 'nsamples': 750}, 'images': {'pearson': (0.8203867295657831, 9.425906365562756e-184), 'spearman': SpearmanrResult(correlation=0.7886886906152988, pvalue=3.4410673031736275e-160), 'nsamples': 750}, 'OnWN': {'pearson': (0.6293394184088853, 5.683036182764806e-84), 'spearman': SpearmanrResult(correlation=0.670056168045285, pvalue=6.855093979408371e-99), 'nsamples': 750}, 'tweet-news': {'pearson': (0.7373854337595948, 1.3901773295687528e-129), 'spearman': SpearmanrResult(correlation=0.6985061104075353, pvalue=8.2287377776831e-111), 'nsamples': 750}, 'all': {'pearson': {'mean': 0.6505068260232385, 'wmean': 0.6682648878651034}, 'spearman': {'mean': 0.6361889693348879, 'wmean': 0.6545497140576491}}}, 'STS15': {'answers-forums': {'pearson': (0.6554249972891065, 2.1271261100008417e-47), 'spearman': SpearmanrResult(correlation=0.6601587975606593, pvalue=2.726124243082021e-48), 'nsamples': 375}, 'answers-students': {'pearson': (0.7573938306575574, 1.316509038401903e-140), 'spearman': SpearmanrResult(correlation=0.7580757377086599, pvalue=5.307184802456452e-141), 'nsamples': 750}, 'belief': {'pearson': (0.6938360601407095, 3.874701942825776e-55), 'spearman': SpearmanrResult(correlation=0.70217697353048, pvalue=5.543214866718992e-57), 'nsamples': 375}, 'headlines': {'pearson': (0.7074291257643678, 7.606360102914599e-115), 'spearman': SpearmanrResult(correlation=0.7048600919337434, pvalue=1.1433940132491299e-113), 'nsamples': 750}, 'images': {'pearson': (0.8668311693111709, 2.796222690292546e-228), 'spearman': SpearmanrResult(correlation=0.8666790008905915, pvalue=4.158163553853086e-228), 'nsamples': 750}, 'all': {'pearson': {'mean': 0.7361830366325823, 'wmean': 0.751571163612001}, 'spearman': {'mean': 0.7383901203248268, 'wmean': 0.7526956790196411}}}, 'STS16': {'answer-answer': {'pearson': (0.6390744533171796, 1.4684840675437832e-30), 'spearman': SpearmanrResult(correlation=0.6344016536328181, pvalue=5.220136063860679e-30), 'nsamples': 254}, 'headlines': {'pearson': (0.7236185236906961, 1.1773489013100314e-41), 'spearman': SpearmanrResult(correlation=0.7177487166715745, pvalue=1.0437695211803904e-40), 'nsamples': 249}, 'plagiarism': {'pearson': (0.8068349082371395, 5.0035304651968556e-54), 'spearman': SpearmanrResult(correlation=0.8217365702171775, pvalue=1.328353052825811e-57), 'nsamples': 230}, 'postediting': {'pearson': (0.8460786512088675, 4.5804231431872625e-68), 'spearman': SpearmanrResult(correlation=0.8507624405425562, pvalue=1.4715854949019706e-69), 'nsamples': 244}, 'question-question': {'pearson': (0.3250647177568497, 1.5686673016894581e-06), 'spearman': SpearmanrResult(correlation=0.3367988573603013, pvalue=6.154958595925993e-07), 'nsamples': 209}, 'all': {'pearson': {'mean': 0.6681342508421466, 'wmean': 0.6766101765111587}, 'spearman': {'mean': 0.6722896476848855, 'wmean': 0.6802983628200636}}}, 'MR': {'devacc': 80.18, 'acc': 80.9, 'ndev': 10662, 'ntest': 10662}, 'CR': {'devacc': 85.11, 'acc': 86.54, 'ndev': 3775, 'ntest': 3775}, 'MPQA': {'devacc': 90.49, 'acc': 90.48, 'ndev': 10606, 'ntest': 10606}, 'SUBJ': {'devacc': 93.56, 'acc': 92.96, 'ndev': 10000, 'ntest': 10000}, 'SST2': {'devacc': 85.32, 'acc': 81.6, 'ndev': 872, 'ntest': 1821}, 'SST5': {'devacc': 45.14, 'acc': 44.16, 'ndev': 1101, 'ntest': 2210}, 'TREC': {'devacc': 87.93, 'acc': 92.2, 'ndev': 5452, 'ntest': 500}, 'MRPC': {'devacc': 77.48, 'acc': 77.86, 'f1': 83.56, 'ndev': 4076, 'ntest': 1725}, 'SICKEntailment': {'devacc': 84.8, 'acc': 87.21, 'ndev': 500, 'ntest': 4927}, 'SICKRelatedness': {'devpearson': 0.8888073586069731, 'pearson': 0.8871502442441512, 'spearman': 0.83463204343083, 'mse': 0.22029426612841826, 'yhat': array([3.22583999, 4.16475753, 1.29015625, ..., 2.99803383, 4.42068648, 4.87265243]), 'ndev': 500, 'ntest': 4927}, 'STSBenchmark': {'devpearson': 0.8086089078977258, 'pearson': 0.7825342758470275, 'spearman': 0.7858373058266386, 'mse': 1.1126025740886758, 'yhat': array([2.35551956, 2.33857733, 1.54570818, ..., 4.20836965, 4.17153144, 3.38133943]), 'ndev': 1500, 'ntest': 1379}, 'Length': {'devacc': 93.08, 'acc': 93.45, 'ndev': 9996, 'ntest': 9996}, 'WordContent': {'devacc': 93.96, 'acc': 93.95, 'ndev': 10000, 'ntest': 10000}, 'Depth': {'devacc': 40.69, 'acc': 40.85, 'ndev': 10000, 'ntest': 10000}, 'TopConstituents': {'devacc': 85.45, 'acc': 85.43, 'ndev': 10000, 'ntest': 10000}, 'BigramShift': {'devacc': 74.37, 'acc': 74.63, 'ndev': 10000, 'ntest': 10000}, 'Tense': {'devacc': 90.78, 'acc': 89.99, 'ndev': 10000, 'ntest': 10000}, 'SubjNumber': {'devacc': 91.32, 'acc': 89.65, 'ndev': 10000, 'ntest': 10000}, 'ObjNumber': {'devacc': 89.64, 'acc': 89.81, 'ndev': 10000, 'ntest': 10000}, 'OddManOut': {'devacc': 52.28, 'acc': 51.98, 'ndev': 10000, 'ntest': 10000}, 'CoordinationInversion': {'devacc': 68.9, 'acc': 67.96, 'ndev': 10002, 'ntest': 10002}}

0 人点赞