 Scikit-Learn
Use Scikit-learn SVC to Classify Characters in Natural Images: A Beginner Example

[ 182 ]

Classifying characters in natural images

Now let's try a more challenging problem. We will classify alphanumeric characters

in natural images. The Chars74K dataset, collected by T. E. de Campos, B. R. Babu,

and M. Varma for Character Recognition in Natural Images, contains more than 74,000

images of the digits zero through to nine and the characters for both cases of the

English alphabet. The following are three examples of images of the lowercase

demos/chars74k/.

[ 183 ]

Several types of images comprise the collection. We will use 7,705 images of

characters that were extracted from photographs of street scenes taken in Bangalore,

India. In contrast to MNIST, the images in this portion of Chars74K depict the

characters in a variety of fonts, colors, and perturbations. After expanding the

archive, we will use the files in the English/Img/GoodImg/Bmp/ directory. First we

will import the necessary classes.

import os

import numpy as np

from sklearn.svm import SVC

from sklearn.cross_validation import train_test_split

from sklearn.metrics import classification_report

import Image

Next we will define a function that resizes images using the Python Image Library:

def resize_and_crop(image, size):

img_ratio = image.size / float(image.size)

ratio = size / float(size)

if ratio > img_ratio:

image = image.resize((size, size * image.size /

image.size), Image.ANTIALIAS)

image = image.crop((0, 0, 30, 30))

elif ratio < img_ratio:

image = image.resize((size * image.size / image.size,

size), Image.ANTIALIAS)

image = image.crop((0, 0, 30, 30))

else:

image = image.resize((size, size), Image.ANTIALIAS)

return image

[ 184 ]

Then we load will the images for each of the 62 classes and convert them to grayscale.

Unlike MNIST, the images of Chars74K do not have consistent dimensions, so we

will resize them to 30 pixels on a side using the resize_and_crop function we defined.

Finally, we will convert the processed images to a NumPy array:

X = []

y = []

for path, subdirs, files in os.walk('data/English/Img/GoodImg/Bmp/'):

for filename in files:

f = os.path.join(path, filename)

img = Image.open(f).convert('L') # convert to grayscale

img_resized = resize_and_crop(img, (30, 30))

img_resized = np.asarray(img_resized.getdata(), dtype=np.

float64)

.reshape((img_resized.size * img_resized.size, 1))

target = filename[3:filename.index('-')]

X.append(img_resized)

y.append(target)

X = np.array(X)

X = X.reshape(X.shape[:2])

We will then train a support vector classifier with a polynomial

kernel.classifier = SVC(verbose=0, kernel='poly', degree=3)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_

state=1)

classifier.fit(X_train, y_train)

predictions = classifier.predict(X_test)

print classification_report(y_test, predictions)

The preceding script produces the following output:

precision recall f1-score support

001 0.24 0.22 0.23 23

002 0.24 0.45 0.32 20

...

061 0.33 0.15 0.21 13

062 0.08 0.25 0.12 8

avg / total 0.41 0.34 0.36 1927

It is apparent that this is a more challenging task than classifying digits in MNIST.

The appearances of the characters vary more widely, the characters are perturbed

more since the images were sampled from photographs rather than scanned

documents. Furthermore, there are far fewer training instances for each class in

Chars74K than there are in MNIST. The performance of the classifier could be

improved by adding training data, preprocessing the images differently, or using

more sophisticated feature representations.