Approximating XOR with Multilayer 
Perceptron in Scikit-learn: A Beginner Example
Scikit-Learn
Approximating XOR with Multilayer Perceptron in Scikit-learn: A Beginner Example

[ 212 ]

Approximating XOR with Multilayer

perceptrons

Let's train a multilayer perceptron to approximate the XOR function. At the time of

writing, multilayer perceptrons have been implemented as part of a 2014 Google

Summer of Code project, but have not been merged or released. Subsequent versions

of scikit-learn are likely to include this implementation of multilayer perceptrons

without any changes to the API described in this section. In the interim, a fork of

scikit-learn 0.15.1 that includes the multilayer perceptron implementation can be

cloned from https://github.com/IssamLaradji/scikit-learn.git.

First, we will create a toy binary classification dataset that represents XOR and split it

into training and testing sets:

>>> from sklearn.cross_validation import train_test_split

>>> from sklearn.neural_network import MultilayerPerceptronClassifier

>>> y = [0, 1, 1, 0] * 1000

>>> X = [[0, 0], [0, 1], [1, 0], [1, 1]] * 1000

>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_

state=3)

Next we instantiate MultilayerPerceptronClassifier. We specify the architecture

of the network through the n_hidden keyword argument, which takes a list of the

number of hidden units in each hidden layer. We create a hidden layer with two units

that use the logistic activation function. The MultilayerPerceptronClassifier class

automatically creates two input units and one output unit. In multi-class problems the

classifier will create one output unit for each of the possible classes.

Selecting an architecture is challenging. There are some rules of thumb to choose

the numbers of hidden units and layers, but these tend to be supported only by

anecdotal evidence. The optimal number of hidden units depends on the number of

training instances, the noise in the training data, the complexity of the function that

is being approximated, the hidden units' activation function, the learning algorithm,

and the regularization employed. In practice, architectures can only be evaluated by

comparing their performances through cross validation.

We train the network by calling the fit() method:

>>> clf = MultilayerPerceptronClassifier(n_hidden=[2],

>>> activation='logistic',

>>> algorithm='sgd',

>>> random_state=3)

>>> clf.fit(X_train, y_train)

Finally, we print some predictions for manual inspection and evaluate the model's

accuracy on the test set. The network perfectly approximates the XOR function on

the test set:

>>> print 'Number of layers: %s. Number of outputs: %s' % (clf.n_

layers_, clf.n_outputs_)

>>> predictions = clf.predict(X_test)

>>> print 'Accuracy:', clf.score(X_test, y_test)

>>> for i, p in enumerate(predictions[:10]):

>>> print 'True: %s, Predicted: %s' % (y_test[i], p)

Number of layers: 3. Number of outputs: 1

Accuracy: 1.0

True: 1, Predicted: 1

True: 1, Predicted: 1

True: 1, Predicted: 1

True: 0, Predicted: 0

True: 1, Predicted: 1

True: 0, Predicted: 0

True: 0, Predicted: 0

True: 1, Predicted: 1

True: 0, Predicted: 0

True: 1, Predicted: 1

Related