新学考成绩释放在即,故更新一下之前写的查询。这半年终于把原来的验证码存在Cookie里改成了session。那么还是来看看这个验证码吧:
验证码形式比较简单。比如:
。4位数字,每位为0-8,颜色随机。不过好在数字的位置是固定的。验证码有简单的扭曲处理,不过这个扭曲……看边框,似乎还是生成一个验证码再扭曲。拖进PS,发现背景的杂色一般是灰色小斑点。这种杂色的滤波非常简单,只需要过滤灰色。一般特征就是RGB三个分量差值小,为了防止黑色也被和谐,所以加上任一分量小于128的设定。进一步还发现有浅色的杂色,比如浅紫灰色。那么过滤就靠RGB三个分量相加,结果小于某一值。代码实现如下:
代码语言:javascript复制private static boolean isBackgroundColor(int colorInt) {
Color color = new Color(colorInt);
int inter;
inter = Math.abs(color.getRed() - color.getGreen()) Math.abs(color.getGreen() - color.getBlue()) Math.abs(color.getRed() - color.getBlue());
return inter < 40 && color.getRed() > 128;
}
private static boolean isBackgroundColor2(int colorInt) {
Color color = new Color(colorInt);
return color.getRed() color.getGreen() color.getBlue() > 550;
}
然后就直接二值化咯:
代码语言:javascript复制public static BufferedImage binaryzation(BufferedImage image)
throws Exception {
int width = image.getWidth();
int height = image.getHeight();
for (int x = 0; x < width; x) {
for (int y = 0; y < height; y) {
if (isBackgroundColor(image.getRGB(x, y))) {
image.setRGB(x, y, Color.WHITE.getRGB());
} else if(isBackgroundColor2(image.getRGB(x, y))) {
image.setRGB(x, y, Color.WHITE.getRGB());
} else {
image.setRGB(x, y, Color.BLACK.getRGB());
}
}
}
return image;
}
来跑一边看看效果:
。还不错!接下来分割数字。因为有不同程度的拉伸,所以还是分为四位,每位分别识别好了。分割:
代码语言:javascript复制public static List<BufferedImage> splitImage(BufferedImage image) throws Exception {
List<BufferedImage> digitImageList = new ArrayList<>();
digitImageList.add(image.getSubimage(0, 0, 16, 40));
digitImageList.add(image.getSubimage(16, 0, 19, 40));
digitImageList.add(image.getSubimage(36, 0, 22, 40));
digitImageList.add(image.getSubimage(58, 0, 22, 40));
return digitImageList;
}
分割结果:
、
、
、
。分割完就可以来收集每一位数字了:
然后读入:
代码语言:javascript复制static { // 装载模型
try {
model = new ArrayList<>();
List<BufferedImage> list;
for (int i = 0; i <= 3; i ) {
list = new ArrayList<>();
for (int ii = 0; ii <= 8; ii ) {
list.add(ImageIO.read(new File("captcha/" i "/" ii ".png")));
}
model.add(list);
}
} catch (Exception e) {
System.out.println("Error occurred in reading captcha model: " e ", " e.getLocalizedMessage());
}
}
因为字体也没变,所以直接逐像素比对,统计不同像素,取最小的一个数字。统计不同:
代码语言:javascript复制private static int diff(BufferedImage img_a, BufferedImage img_b) {
int diff = 0;
int width = img_a.getWidth();
int height = img_a.getHeight();
for (int x = 0; x < width; x) {
for (int y = 0; y < height; y) {
if (img_a.getRGB(x, y) != img_b.getRGB(x, y)) diff ;
}
}
return diff;
}
最后就是比对,加入读入、二值化等等如下:
代码语言:javascript复制public static String read(BufferedImage image) throws Exception {
Filtering.binaryzation(image);
List<BufferedImage> imgs = Infer.splitImage(image);
BufferedImage cur;
String result = "";
int cur_diff, min_diff, min;
for (int idx = 0; idx <= 3; idx ) {
cur = imgs.get(idx);
min_diff = 999; // 初始化一个极大值
min = 0;
for (int i = 0; i <= 8; i ) {
cur_diff = diff(cur, model.get(idx).get(i));
// System.out.println("Diff for image: " idx ", " i ", result: " cur_diff);
if (cur_diff < min_diff) {
min_diff = cur_diff;
min = i;
}
}
result = min;
}
return result;
}
测试起来,识别率基本就是100%。当然主要是因为验证码太简单了。