MNIST Handwritten Digit Classifier

An implementation of multilayer neural network using Python's numpy library. The implementation is a modified version of Michael Nielsen's implementation in Neural Networks and Deep Learning book.

Why a modified implementation ?

This book and Stanford's Machine Learning Course by Prof. Andrew Ng are recommended as good resources for beginners. At times, it got confusing to me while referring both resources:

Stanford Course uses MATLAB, which has 1-indexed vectors and matrices.
The book uses numpy library of Python, which has 0-indexed vectors and arrays.

Further more, some parameters of a neural network are not defined for the input layer, hence I didn't get a hang of implementation using Python. For example according to the book, the bias vector of second layer of neural network was referred as bias[0] as input layer(first layer) has no bias vector. So indexing got weird with numpy and MATLAB.

Brief Background:

For total beginners who landed up here before reading anything about Neural Networks:

Usually, neural networks are made up of building blocks known as Sigmoid Neurons. These are named so because their output follows Sigmoid Function.
x_j are inputs, which are weighted by w_j weights and the neuron has its intrinsic bias b. THe output of neuron is known as "activation ( a )".
A neural network is made up by stacking layers of neurons, and is defined by the weights of connections and biases of neurons. Activations are a result dependent on a particular input.

Naming and Indexing Convention:

I have followed a particular convention in indexing quantities. Dimensions of quantities are listed according to this figure.

Layers

Input layer is the 0^th layer, and output layer is the L^th layer. Number of layers: N_L = L + 1.

sizes = [2, 3, 1]

Weights

Weights in this neural network implementation are a list of matrices (numpy.ndarrays). weights[l] is a matrix of weights entering the l^th layer of the network (Denoted as w^l).
An element of this matrix is denoted as w^l_jk. It is a part of j^th row, which is a collection of all weights entering j^th neuron, from all neurons (0 to k) of (l-1)^th layer.
No weights enter the input layer, hence weights[0] is redundant, and further it follows as weights[1] being the collection of weights entering layer 1 and so on.

weights = |¯   [[]],    [[a, b],    [[p],   ¯|
          |              [c, d],     [q],    |
          |_             [e, f]],    [r]]   _|

Biases

Biases in this neural network implementation are a list of one-dimensional vectors (numpy.ndarrays). biases[l] is a vector of biases of neurons in the l^th layer of network (Denoted as b^l).
An element of this vector is denoted as b^l_j. It is a part of j^th row, the bias of j^th in layer.
Input layer has no biases, hence biases[0] is redundant, and further it follows as biases[1] being the biases of neurons of layer 1 and so on.

biases = |¯   [[],    [[0],    [[0]]   ¯|
         |     []],    [1],             |
         |_            [2]],           _|

'Z's

For input vector x to a layer l, z is defined as: z^l = w^l . x + b^l
Input layer provides x vector as input to layer 1, and itself has no input, weight or bias, hence zs[0] is redundant.
Dimensions of zs will be same as biases.

Activations

Activations of l^th layer are outputs from neurons of l^th which serve as input to (l+1)^th layer. The dimensions of biases, zs and activations are similar.
Input layer provides x vector as input to layer 1, hence activations[0] can be related to x - the input trainng example.

from Hacker News http://ift.tt/YV9WJO
via IFTTT

jebmo’s posthaven

Neural Network implementation in Python using numpy [for beginners]