文本摘要任务中最常用的评价方法是ROUGE(Recall-Oriented Understudy for Gisting Evaluation)。ROUGE受到了机器翻译自动评价方法BLEU的启发,不同之处在于,采用召回率来作为指标。基本思想是将模型生成的摘要与参考摘要的n元组贡献统计量作为评判依据。
现在主要采用软件是PERL语言编写的版本,见地址:
(https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5)
然而这个工具的搭建,相对还比较麻烦,因此把整个搭建的过程记录一下:
(1)安装PERL语言,一般的Ubuntu环境都具备
(2)安装PERL语言的相关库,主要是XML语言解析器
(3)对数据进行处理,主要是WordNet数据的处理,主要是原来给的文件会存在无法打开的问题,即报如下错误:(Cannot open exception db file for reading: data/WordNet-2.0.exc.db)
处理步骤如下:
代码语言:javascript复制cd pythonrouge/RELEASE-1.5.5/data/
rm WordNet-2.0.exc.db
./WordNet-2.0-Exceptions/buildExeptionDB.pl ./WordNet-2.0-Exceptions ./smart_common_words.txt ./WordNet-2.0.exc.db
然后进行测试:
./ROUGE-1.5.5.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -a ROUGE-test.xml
其中测试文件可以从如下网址下载:ROUGE-test.xml (https://raw.githubusercontent.com/summanlp/evaluation/master/ROUGE-RELEASE-1.5.5/sample-test/ROUGE-test.xml)
测试呈现的结果如下:
代码语言:javascript复制omnisky@omnisky:~/software/ROUGE-1.5.5$ ./ROUGE-1.5.5.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -b 75 -m -a ROUGE-test.xml
---------------------------------------------
11 ROUGE-1 Average_R: 0.22536 (95%-conf.int. 0.18124 - 0.27016)
11 ROUGE-1 Average_P: 0.20359 (95%-conf.int. 0.17004 - 0.23980)
11 ROUGE-1 Average_F: 0.21027 (95%-conf.int. 0.17278 - 0.24804)
---------------------------------------------
11 ROUGE-2 Average_R: 0.03522 (95%-conf.int. 0.01812 - 0.05479)
11 ROUGE-2 Average_P: 0.02964 (95%-conf.int. 0.01698 - 0.04433)
11 ROUGE-2 Average_F: 0.03109 (95%-conf.int. 0.01669 - 0.04702)
---------------------------------------------
11 ROUGE-3 Average_R: 0.00243 (95%-conf.int. 0.00000 - 0.00774)
11 ROUGE-3 Average_P: 0.00171 (95%-conf.int. 0.00000 - 0.00545)
11 ROUGE-3 Average_F: 0.00201 (95%-conf.int. 0.00000 - 0.00640)
---------------------------------------------
11 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
11 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
11 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
11 ROUGE-L Average_R: 0.17144 (95%-conf.int. 0.14160 - 0.19972)
11 ROUGE-L Average_P: 0.15459 (95%-conf.int. 0.13442 - 0.17638)
11 ROUGE-L Average_F: 0.15979 (95%-conf.int. 0.13524 - 0.18293)
---------------------------------------------
11 ROUGE-W-1.2 Average_R: 0.10366 (95%-conf.int. 0.08657 - 0.12032)
11 ROUGE-W-1.2 Average_P: 0.14874 (95%-conf.int. 0.13072 - 0.16891)
11 ROUGE-W-1.2 Average_F: 0.12004 (95%-conf.int. 0.10260 - 0.13592)
---------------------------------------------
11 ROUGE-S* Average_R: 0.02919 (95%-conf.int. 0.01857 - 0.04092)
11 ROUGE-S* Average_P: 0.02341 (95%-conf.int. 0.01557 - 0.03158)
11 ROUGE-S* Average_F: 0.02400 (95%-conf.int. 0.01604 - 0.03186)
---------------------------------------------
11 ROUGE-SU* Average_R: 0.06207 (95%-conf.int. 0.04583 - 0.07847)
11 ROUGE-SU* Average_P: 0.05303 (95%-conf.int. 0.04009 - 0.06763)
11 ROUGE-SU* Average_F: 0.05310 (95%-conf.int. 0.04099 - 0.06553)
---------------------------------------------
12 ROUGE-1 Average_R: 0.24238 (95%-conf.int. 0.18967 - 0.29513)
12 ROUGE-1 Average_P: 0.25533 (95%-conf.int. 0.19308 - 0.31825)
12 ROUGE-1 Average_F: 0.24485 (95%-conf.int. 0.19150 - 0.30122)
---------------------------------------------
12 ROUGE-2 Average_R: 0.05210 (95%-conf.int. 0.02453 - 0.08236)
12 ROUGE-2 Average_P: 0.05569 (95%-conf.int. 0.02581 - 0.08922)
12 ROUGE-2 Average_F: 0.05265 (95%-conf.int. 0.02501 - 0.08296)
---------------------------------------------
12 ROUGE-3 Average_R: 0.01023 (95%-conf.int. 0.00114 - 0.02271)
12 ROUGE-3 Average_P: 0.01027 (95%-conf.int. 0.00125 - 0.02146)
12 ROUGE-3 Average_F: 0.00995 (95%-conf.int. 0.00119 - 0.02145)
---------------------------------------------
12 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
12 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
12 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
12 ROUGE-L Average_R: 0.18008 (95%-conf.int. 0.13709 - 0.22455)
12 ROUGE-L Average_P: 0.18728 (95%-conf.int. 0.14248 - 0.23337)
12 ROUGE-L Average_F: 0.18062 (95%-conf.int. 0.13810 - 0.22318)
---------------------------------------------
12 ROUGE-W-1.2 Average_R: 0.10847 (95%-conf.int. 0.08398 - 0.13339)
12 ROUGE-W-1.2 Average_P: 0.17875 (95%-conf.int. 0.13756 - 0.22048)
12 ROUGE-W-1.2 Average_F: 0.13289 (95%-conf.int. 0.10403 - 0.16220)
---------------------------------------------
12 ROUGE-S* Average_R: 0.03833 (95%-conf.int. 0.02085 - 0.05926)
12 ROUGE-S* Average_P: 0.04319 (95%-conf.int. 0.02107 - 0.06921)
12 ROUGE-S* Average_F: 0.03788 (95%-conf.int. 0.01997 - 0.05816)
---------------------------------------------
12 ROUGE-SU* Average_R: 0.07160 (95%-conf.int. 0.04882 - 0.09699)
12 ROUGE-SU* Average_P: 0.08071 (95%-conf.int. 0.05108 - 0.11638)
12 ROUGE-SU* Average_F: 0.07160 (95%-conf.int. 0.04794 - 0.09681)
---------------------------------------------
13 ROUGE-1 Average_R: 0.20161 (95%-conf.int. 0.15184 - 0.25908)
13 ROUGE-1 Average_P: 0.19956 (95%-conf.int. 0.14511 - 0.25978)
13 ROUGE-1 Average_F: 0.20030 (95%-conf.int. 0.14833 - 0.25923)
---------------------------------------------
13 ROUGE-2 Average_R: 0.04886 (95%-conf.int. 0.02609 - 0.07824)
13 ROUGE-2 Average_P: 0.04829 (95%-conf.int. 0.02445 - 0.07861)
13 ROUGE-2 Average_F: 0.04846 (95%-conf.int. 0.02523 - 0.07828)
---------------------------------------------
13 ROUGE-3 Average_R: 0.00887 (95%-conf.int. 0.00250 - 0.01758)
13 ROUGE-3 Average_P: 0.00909 (95%-conf.int. 0.00250 - 0.01804)
13 ROUGE-3 Average_F: 0.00897 (95%-conf.int. 0.00250 - 0.01758)
---------------------------------------------
13 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
13 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
13 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
13 ROUGE-L Average_R: 0.17044 (95%-conf.int. 0.12873 - 0.21975)
13 ROUGE-L Average_P: 0.16849 (95%-conf.int. 0.12400 - 0.22144)
13 ROUGE-L Average_F: 0.16919 (95%-conf.int. 0.12604 - 0.21969)
---------------------------------------------
13 ROUGE-W-1.2 Average_R: 0.10327 (95%-conf.int. 0.08048 - 0.12969)
13 ROUGE-W-1.2 Average_P: 0.16067 (95%-conf.int. 0.12237 - 0.20421)
13 ROUGE-W-1.2 Average_F: 0.12550 (95%-conf.int. 0.09682 - 0.15816)
---------------------------------------------
13 ROUGE-S* Average_R: 0.03974 (95%-conf.int. 0.02107 - 0.06491)
13 ROUGE-S* Average_P: 0.04116 (95%-conf.int. 0.01983 - 0.07039)
13 ROUGE-S* Average_F: 0.04018 (95%-conf.int. 0.02016 - 0.06705)
---------------------------------------------
13 ROUGE-SU* Average_R: 0.06653 (95%-conf.int. 0.04305 - 0.09595)
13 ROUGE-SU* Average_P: 0.06719 (95%-conf.int. 0.04110 - 0.10081)
13 ROUGE-SU* Average_F: 0.06650 (95%-conf.int. 0.04165 - 0.09775)
---------------------------------------------
14 ROUGE-1 Average_R: 0.23816 (95%-conf.int. 0.18633 - 0.28642)
14 ROUGE-1 Average_P: 0.20187 (95%-conf.int. 0.15801 - 0.24672)
14 ROUGE-1 Average_F: 0.21741 (95%-conf.int. 0.16959 - 0.26309)
---------------------------------------------
14 ROUGE-2 Average_R: 0.04832 (95%-conf.int. 0.02575 - 0.07404)
14 ROUGE-2 Average_P: 0.04008 (95%-conf.int. 0.02100 - 0.06148)
14 ROUGE-2 Average_F: 0.04350 (95%-conf.int. 0.02320 - 0.06550)
---------------------------------------------
14 ROUGE-3 Average_R: 0.00626 (95%-conf.int. 0.00129 - 0.01275)
14 ROUGE-3 Average_P: 0.00551 (95%-conf.int. 0.00125 - 0.01106)
14 ROUGE-3 Average_F: 0.00583 (95%-conf.int. 0.00129 - 0.01172)
---------------------------------------------
14 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
14 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
14 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
14 ROUGE-L Average_R: 0.18917 (95%-conf.int. 0.15208 - 0.22554)
14 ROUGE-L Average_P: 0.16072 (95%-conf.int. 0.12931 - 0.19636)
14 ROUGE-L Average_F: 0.17285 (95%-conf.int. 0.13969 - 0.20706)
---------------------------------------------
14 ROUGE-W-1.2 Average_R: 0.11376 (95%-conf.int. 0.09309 - 0.13418)
14 ROUGE-W-1.2 Average_P: 0.15239 (95%-conf.int. 0.12539 - 0.18210)
14 ROUGE-W-1.2 Average_F: 0.12955 (95%-conf.int. 0.10691 - 0.15301)
---------------------------------------------
14 ROUGE-S* Average_R: 0.04052 (95%-conf.int. 0.02377 - 0.05965)
14 ROUGE-S* Average_P: 0.03102 (95%-conf.int. 0.01668 - 0.05095)
14 ROUGE-S* Average_F: 0.03427 (95%-conf.int. 0.01952 - 0.05277)
---------------------------------------------
14 ROUGE-SU* Average_R: 0.07426 (95%-conf.int. 0.05233 - 0.09732)
14 ROUGE-SU* Average_P: 0.05627 (95%-conf.int. 0.03818 - 0.07912)
14 ROUGE-SU* Average_F: 0.06277 (95%-conf.int. 0.04369 - 0.08486)
---------------------------------------------
21 ROUGE-1 Average_R: 0.12268 (95%-conf.int. 0.09798 - 0.14879)
21 ROUGE-1 Average_P: 0.15320 (95%-conf.int. 0.12216 - 0.18730)
21 ROUGE-1 Average_F: 0.13279 (95%-conf.int. 0.10711 - 0.15971)
---------------------------------------------
21 ROUGE-2 Average_R: 0.01529 (95%-conf.int. 0.00592 - 0.02711)
21 ROUGE-2 Average_P: 0.02223 (95%-conf.int. 0.00779 - 0.04143)
21 ROUGE-2 Average_F: 0.01766 (95%-conf.int. 0.00648 - 0.03171)
---------------------------------------------
21 ROUGE-3 Average_R: 0.00146 (95%-conf.int. 0.00000 - 0.00387)
21 ROUGE-3 Average_P: 0.00189 (95%-conf.int. 0.00000 - 0.00500)
21 ROUGE-3 Average_F: 0.00165 (95%-conf.int. 0.00000 - 0.00436)
---------------------------------------------
21 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
21 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
21 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
21 ROUGE-L Average_R: 0.11136 (95%-conf.int. 0.08935 - 0.13337)
21 ROUGE-L Average_P: 0.14091 (95%-conf.int. 0.11120 - 0.17367)
21 ROUGE-L Average_F: 0.12123 (95%-conf.int. 0.09858 - 0.14549)
---------------------------------------------
21 ROUGE-W-1.2 Average_R: 0.07130 (95%-conf.int. 0.05835 - 0.08458)
21 ROUGE-W-1.2 Average_P: 0.14244 (95%-conf.int. 0.11389 - 0.17316)
21 ROUGE-W-1.2 Average_F: 0.09280 (95%-conf.int. 0.07623 - 0.10966)
---------------------------------------------
21 ROUGE-S* Average_R: 0.00977 (95%-conf.int. 0.00502 - 0.01464)
21 ROUGE-S* Average_P: 0.01637 (95%-conf.int. 0.00837 - 0.02539)
21 ROUGE-S* Average_F: 0.01103 (95%-conf.int. 0.00590 - 0.01624)
---------------------------------------------
21 ROUGE-SU* Average_R: 0.03016 (95%-conf.int. 0.02244 - 0.03836)
21 ROUGE-SU* Average_P: 0.05010 (95%-conf.int. 0.03553 - 0.06724)
21 ROUGE-SU* Average_F: 0.03441 (95%-conf.int. 0.02570 - 0.04326)
---------------------------------------------
22 ROUGE-1 Average_R: 0.16619 (95%-conf.int. 0.13350 - 0.20500)
22 ROUGE-1 Average_P: 0.15684 (95%-conf.int. 0.12675 - 0.19079)
22 ROUGE-1 Average_F: 0.15540 (95%-conf.int. 0.12640 - 0.18731)
---------------------------------------------
22 ROUGE-2 Average_R: 0.01970 (95%-conf.int. 0.00940 - 0.03235)
22 ROUGE-2 Average_P: 0.02285 (95%-conf.int. 0.00885 - 0.04183)
22 ROUGE-2 Average_F: 0.01963 (95%-conf.int. 0.00867 - 0.03326)
---------------------------------------------
22 ROUGE-3 Average_R: 0.00267 (95%-conf.int. 0.00000 - 0.00645)
22 ROUGE-3 Average_P: 0.00179 (95%-conf.int. 0.00000 - 0.00439)
22 ROUGE-3 Average_F: 0.00214 (95%-conf.int. 0.00000 - 0.00523)
---------------------------------------------
22 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
22 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
22 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
22 ROUGE-L Average_R: 0.14274 (95%-conf.int. 0.11681 - 0.17154)
22 ROUGE-L Average_P: 0.13564 (95%-conf.int. 0.11091 - 0.16389)
22 ROUGE-L Average_F: 0.13356 (95%-conf.int. 0.11087 - 0.15692)
---------------------------------------------
22 ROUGE-W-1.2 Average_R: 0.08851 (95%-conf.int. 0.07349 - 0.10502)
22 ROUGE-W-1.2 Average_P: 0.13443 (95%-conf.int. 0.10985 - 0.16234)
22 ROUGE-W-1.2 Average_F: 0.10269 (95%-conf.int. 0.08630 - 0.11951)
---------------------------------------------
22 ROUGE-S* Average_R: 0.02048 (95%-conf.int. 0.01204 - 0.03044)
22 ROUGE-S* Average_P: 0.01755 (95%-conf.int. 0.00929 - 0.02714)
22 ROUGE-S* Average_F: 0.01595 (95%-conf.int. 0.00957 - 0.02308)
---------------------------------------------
22 ROUGE-SU* Average_R: 0.04477 (95%-conf.int. 0.03270 - 0.05895)
22 ROUGE-SU* Average_P: 0.04262 (95%-conf.int. 0.02754 - 0.05872)
22 ROUGE-SU* Average_F: 0.03765 (95%-conf.int. 0.02812 - 0.04741)
---------------------------------------------
23 ROUGE-1 Average_R: 0.12235 (95%-conf.int. 0.08927 - 0.15829)
23 ROUGE-1 Average_P: 0.11503 (95%-conf.int. 0.08510 - 0.14914)
23 ROUGE-1 Average_F: 0.11823 (95%-conf.int. 0.08752 - 0.15313)
---------------------------------------------
23 ROUGE-2 Average_R: 0.00681 (95%-conf.int. 0.00000 - 0.01641)
23 ROUGE-2 Average_P: 0.00607 (95%-conf.int. 0.00000 - 0.01473)
23 ROUGE-2 Average_F: 0.00641 (95%-conf.int. 0.00000 - 0.01550)
---------------------------------------------
23 ROUGE-3 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
23 ROUGE-3 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
23 ROUGE-3 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
23 ROUGE-4 Average_R: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
23 ROUGE-4 Average_P: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
23 ROUGE-4 Average_F: 0.00000 (95%-conf.int. 0.00000 - 0.00000)
---------------------------------------------
23 ROUGE-L Average_R: 0.10965 (95%-conf.int. 0.08185 - 0.14105)
23 ROUGE-L Average_P: 0.10383 (95%-conf.int. 0.07819 - 0.13277)
23 ROUGE-L Average_F: 0.10635 (95%-conf.int. 0.07990 - 0.13597)
---------------------------------------------
23 ROUGE-W-1.2 Average_R: 0.06674 (95%-conf.int. 0.05082 - 0.08413)
23 ROUGE-W-1.2 Average_P: 0.10003 (95%-conf.int. 0.07684 - 0.12576)
23 ROUGE-W-1.2 Average_F: 0.07981 (95%-conf.int. 0.06101 - 0.10034)
---------------------------------------------
23 ROUGE-S* Average_R: 0.01001 (95%-conf.int. 0.00430 - 0.01689)
23 ROUGE-S* Average_P: 0.00899 (95%-conf.int. 0.00360 - 0.01568)
23 ROUGE-S* Average_F: 0.00939 (95%-conf.int. 0.00387 - 0.01613)
---------------------------------------------
23 ROUGE-SU* Average_R: 0.02865 (95%-conf.int. 0.01887 - 0.04049)
23 ROUGE-SU* Average_P: 0.02590 (95%-conf.int. 0.01739 - 0.03617)
23 ROUGE-SU* Average_F: 0.02692 (95%-conf.int. 0.01793 - 0.03771)
---------------------------------------------
24 ROUGE-1 Average_R: 0.28540 (95%-conf.int. 0.23134 - 0.34089)
24 ROUGE-1 Average_P: 0.27811 (95%-conf.int. 0.20830 - 0.35585)
24 ROUGE-1 Average_F: 0.26995 (95%-conf.int. 0.21567 - 0.32693)
---------------------------------------------
24 ROUGE-2 Average_R: 0.09309 (95%-conf.int. 0.05629 - 0.13592)
24 ROUGE-2 Average_P: 0.11162 (95%-conf.int. 0.05591 - 0.18063)
24 ROUGE-2 Average_F: 0.09350 (95%-conf.int. 0.05490 - 0.13769)
---------------------------------------------
24 ROUGE-3 Average_R: 0.03075 (95%-conf.int. 0.01081 - 0.05737)
24 ROUGE-3 Average_P: 0.04022 (95%-conf.int. 0.00900 - 0.08575)
24 ROUGE-3 Average_F: 0.03122 (95%-conf.int. 0.00986 - 0.05847)
---------------------------------------------
24 ROUGE-4 Average_R: 0.00860 (95%-conf.int. 0.00000 - 0.02009)
24 ROUGE-4 Average_P: 0.00703 (95%-conf.int. 0.00000 - 0.01639)
24 ROUGE-4 Average_F: 0.00774 (95%-conf.int. 0.00000 - 0.01805)
---------------------------------------------
24 ROUGE-L Average_R: 0.24161 (95%-conf.int. 0.19599 - 0.29178)
24 ROUGE-L Average_P: 0.24108 (95%-conf.int. 0.17423 - 0.31614)
24 ROUGE-L Average_F: 0.23010 (95%-conf.int. 0.18351 - 0.28161)
---------------------------------------------
24 ROUGE-W-1.2 Average_R: 0.14412 (95%-conf.int. 0.11828 - 0.17208)
24 ROUGE-W-1.2 Average_P: 0.22825 (95%-conf.int. 0.16623 - 0.30239)
24 ROUGE-W-1.2 Average_F: 0.16917 (95%-conf.int. 0.13670 - 0.20511)
---------------------------------------------
24 ROUGE-S* Average_R: 0.06608 (95%-conf.int. 0.04152 - 0.09971)
24 ROUGE-S* Average_P: 0.08502 (95%-conf.int. 0.04014 - 0.13988)
24 ROUGE-S* Average_F: 0.05949 (95%-conf.int. 0.03719 - 0.08882)
---------------------------------------------
24 ROUGE-SU* Average_R: 0.10433 (95%-conf.int. 0.07587 - 0.14102)
24 ROUGE-SU* Average_P: 0.12555 (95%-conf.int. 0.06607 - 0.19873)
24 ROUGE-SU* Average_F: 0.09434 (95%-conf.int. 0.06742 - 0.12844)
omnisky@omnisky:~/software/ROUGE-1.5.5$