SAS程序猿/媛有时候会碰到去除字符串中重复值的问题,用常用的字符函数如SCAN,SUBSTR可能会很费劲,用正则表达式来处理就简单了。示例程序如下:
代码语言:javascript复制data _null_;
infile cards truncover;
input STRING $32767.;
REX1=prxparse('s/([a-z]. ?.s )(.*?)(1 )/23/i');
REX2=prxparse('/([a-z]. ?.s )(.*?)(1 )/i');
do i=1 to 100;
STRING=prxchange(REX1, -1, compbl(STRING));
if not prxmatch(REX2, compbl(STRING)) then leave;
end;
put STRING=;
cards;
a. The cow jumps over the moon.
a. The cow jumps over the moon. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog. a. The cow jumps over the moon.
b. The chicken crossed the road. a. The cow jumps over the moon. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog.
a. The cow jumps over the moon. a. The cow jumps over the moon. b. The chicken crossed the road. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog. c. The quick brown fox jumped over the lazy dog.
a. The cows jump over the moon. a. The cows jump over the moon. b. The chickens crossed the road. b. The chickens crossed the road. c. The quick brown foxes jumped over the lazy dog. c. The quick brown foxes jumped over the lazy dog.
a. The cow jumps over the moon. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog. a. The cow jumps over the moon. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog.
;
run;
可以看到上面的重复项是一整个句子,如果重复项是单词,上面的表达式就要改了:
代码语言:javascript复制data _null_;
STRING='cow chicken fox cow chicken fox cows chickens foxes';
REX1=prxparse('s/(bw b)(.*?)(b1 b)/23/i');
REX2=prxparse('/(bw b)(.*?)(b1 b)/i');
do i=1 to 100;
STRING=prxchange(REX1, -1, compbl(STRING));
if not prxmatch(REX2, compbl(STRING)) then leave;
end;
put STRING=;
run;
注意上面的表达式中第一个括号中的b是用来限定只匹配单词而不是单个字母。第三个括号中的b
表示精确匹配,即匹配一模一样的单词。