English version Readme is available here .
通过简单的图像识别算法来完成验证码识别,打算把机器学习中的分类算法全部使用一遍。
Image (图像处理库)
numpy (数学处理库)
ImageEnhance (图像处理库)
enhancer = ImageEnhance.Contrast(img) # 增加对比对
img = enhancer.enhance(2)
enhancer = ImageEnhance.Sharpness(img) # 锐化
img = enhancer.enhance(2)
enhancer = ImageEnhance.Brightness(img) # 增加亮度
img = enhancer.enhance(2)
# kNN algorithm
def classify0(inX, dataSet, labels, k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize, 1)) - dataSet
sqDiffMat = diffMat ** 2
sqDistances = sqDiffMat.sum(axis=1)
distances = sqDistances ** 0.5
sortedDistIndicies = distances.argsort()
classCount = {}
for i in range(k):
voteIlabel = labels[sortedDistIndicies[i]] # changed
classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
return sortedClassCount[0][0]
根据算法的性质,可以问题设定成一个二分类问题:识别数字1和2(当然也可以是其他的任意两个数字)。
MIT