Machine Learning Online Class - Exercise 4 Neural Network Learning

% Instructions
% ------------
%
% This file contains code that helps you get started on the
% linear exercise. You will need to complete the following functions
% in this exericse:
%
% sigmoidGradient.m
% randInitializeWeights.m
% nnCostFunction.m
%
% For this exercise, you will not need to change any code in this file,
% or any other files other than those mentioned above.
%

Initialization

clear ; close all; clc;

Setup the parameters you will use for this exercise

input_layer_size = 400; % 20x20 Input Images of Digits
hidden_layer_size = 25; % 25 hidden units
num_labels = 10; % 10 labels, from 1 to 10
% (note that we have mapped "0" to label 10)

=========== Part 1: Loading and Visualizing Data =============

We start the exercise by first loading and visualizing the dataset.
You will be working with a dataset that contains handwritten digits.
% Load Training Data
fprintf('Loading and Visualizing Data ...\n')
Loading and Visualizing Data ...
load('ex4data1.mat');
m = size(X, 1);
% Randomly select 100 data points to display
sel = randperm(size(X, 1));
sel = sel(1:100);
displayData(X(sel, :));

================ Part 2: Loading Parameters ================

In this part of the exercise, we load some pre-initialized neural network parameters.
fprintf('\nLoading Saved Neural Network Parameters ...\n')
Loading Saved Neural Network Parameters ...
% Load the weights into variables Theta1 and Theta2
load('ex4weights.mat');
% Unroll parameters
nn_params = [Theta1(:) ; Theta2(:)];

================ Part 3: Compute Cost (Feedforward) ================

To the neural network, you should first start by implementing the
feedforward part of the neural network that returns the cost only. You
should complete the code in nnCostFunction.m to return cost. After
implementing the feedforward to compute the cost, you can verify that
your implementation is correct by verifying that you get the same cost
as us for the fixed debugging parameters.
We suggest implementing the feedforward cost *without* regularization
first so that it will be easier for you to debug. Later, in part 4, you
will get to implement the regularized cost.
fprintf('\nFeedforward Using Neural Network ...\n')
Feedforward Using Neural Network ...
% Weight regularization parameter (we set this to 0 here).
lambda = 0;
[J, ~] = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ...
num_labels, X, y, lambda);
fprintf(['Cost at parameters (loaded from ex4weights): %f '...
'\n(this value should be about 0.287629)\n'], J);
Cost at parameters (loaded from ex4weights): 0.287629
(this value should be about 0.287629)

=============== Part 4: Implement Regularization ===============

Once your cost function implementation is correct, you should now
continue to implement the regularization with the cost.
fprintf('\nChecking Cost Function (w/ Regularization) ... \n')
Checking Cost Function (w/ Regularization) ...
% Weight regularization parameter (we set this to 1 here).
lambda = 1;
J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ...
num_labels, X, y, lambda);
fprintf(['Cost at parameters (loaded from ex4weights): %f '...
'\n(this value should be about 0.383770)\n'], J);
Cost at parameters (loaded from ex4weights): 0.383770
(this value should be about 0.383770)

================ Part 5: Sigmoid Gradient ================

Before you start implementing the neural network, you will first
implement the gradient for the sigmoid function. You should complete the
code in the sigmoidGradient.m file.
fprintf('\nEvaluating sigmoid gradient...\n')
Evaluating sigmoid gradient...
g = sigmoidGradient([-1 -0.5 0 0.5 1]);
fprintf('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:\n ');
Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:
fprintf('%f ', g);
0.196612 0.235004 0.250000 0.235004 0.196612

================ Part 6: Initializing Pameters ================

In this part of the exercise, you will be starting to implment a two
layer neural network that classifies digits. You will start by
implementing a function to initialize the weights of the neural network
(randInitializeWeights.m)
fprintf('\nInitializing Neural Network Parameters ...\n')
Initializing Neural Network Parameters ...
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
% Unroll parameters
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];
这里选择 eps =

=============== Part 7: Implement Backpropagation ===============

Once your cost matches up with ours, you should proceed to implement the
backpropagation algorithm for the neural network. You should add to the
code you've written in nnCostFunction.m to return the partial
derivatives of the parameters.
fprintf('\nChecking Backpropagation... \n');
Checking Backpropagation...
% Check gradients by running checkNNGradients
checkNNGradients;
-0.0093 -0.0093
0.0089 0.0089
-0.0084 -0.0084
0.0076 0.0076
-0.0067 -0.0067
-0.0000 -0.0000
0.0000 0.0000
-0.0000 -0.0000
0.0000 0.0000
-0.0000 -0.0000
-0.0002 -0.0002
0.0002 0.0002
-0.0003 -0.0003
0.0003 0.0003
-0.0004 -0.0004
-0.0001 -0.0001
0.0001 0.0001
-0.0001 -0.0001
0.0002 0.0002
-0.0002 -0.0002
0.3145 0.3145
0.1111 0.1111
0.0974 0.0974
0.1641 0.1641
0.0576 0.0576
0.0505 0.0505
0.1646 0.1646
0.0578 0.0578
0.0508 0.0508
0.1583 0.1583
0.0559 0.0559
0.0492 0.0492
0.1511 0.1511
0.0537 0.0537
0.0471 0.0471
0.1496 0.1496
0.0532 0.0532
0.0466 0.0466
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then
the relative difference will be small (less than 1e-9).

Relative Difference: 2.47239e-11

=============== Part 8: Implement Regularization ===============

Once your backpropagation implementation is correct, you should now
continue to implement the regularization with the cost and gradient.
fprintf('\nChecking Backpropagation (w/ Regularization) ... \n')
Checking Backpropagation (w/ Regularization) ...
% Check gradients by running checkNNGradients
lambda = 3;
checkNNGradients(lambda);
-0.0093 -0.0093
0.0089 0.0089
-0.0084 -0.0084
0.0076 0.0076
-0.0067 -0.0067
-0.0168 -0.0168
0.0394 0.0394
0.0593 0.0593
0.0248 0.0248
-0.0327 -0.0327
-0.0602 -0.0602
-0.0320 -0.0320
0.0249 0.0249
0.0598 0.0598
0.0386 0.0386
-0.0174 -0.0174
-0.0576 -0.0576
-0.0452 -0.0452
0.0091 0.0091
0.0546 0.0546
0.3145 0.3145
0.1111 0.1111
0.0974 0.0974
0.1187 0.1187
0.0000 0.0000
0.0337 0.0337
0.2040 0.2040
0.1171 0.1171
0.0755 0.0755
0.1257 0.1257
-0.0041 -0.0041
0.0170 0.0170
0.1763 0.1763
0.1131 0.1131
0.0862 0.0862
0.1323 0.1323
-0.0045 -0.0045
0.0015 0.0015
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then
the relative difference will be small (less than 1e-9).

Relative Difference: 2.40444e-11
% Also output the costFunction debugging values
[debug_J, ~] = nnCostFunction(nn_params, input_layer_size, ...
hidden_layer_size, num_labels, X, y, lambda);
fprintf(['\n\nCost at (fixed) debugging parameters (w/ lambda = %f): %f ' ...
'\n(for lambda = 3, this value should be about 0.576051)\n\n'], lambda, debug_J);
Cost at (fixed) debugging parameters (w/ lambda = 3.000000): 0.576051
(for lambda = 3, this value should be about 0.576051)

=================== Part 8: Training NN ===================

You have now implemented all the code necessary to train a neural
network. To train your neural network, we will now use "fmincg", which
is a function which works similarly to "fminunc". Recall that these
advanced optimizers are able to train our cost functions efficiently as
long as we provide them with the gradient computations.
fprintf('\nTraining Neural Network... \n')
Training Neural Network...
% After you have completed the assignment, change the MaxIter to a larger
% value to see how more training helps.
options = optimset('MaxIter', 50);
% You should also try different values of lambda
lambda = 1;
% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, X, y, lambda);
% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);
Iteration 1 | Cost: 3.422939e+00
Iteration 2 | Cost: 3.237755e+00
Iteration 3 | Cost: 3.126608e+00
Iteration 4 | Cost: 2.967579e+00
Iteration 5 | Cost: 2.672590e+00
Iteration 6 | Cost: 2.311362e+00
Iteration 7 | Cost: 2.145287e+00
Iteration 8 | Cost: 1.796259e+00
Iteration 9 | Cost: 1.495754e+00
Iteration 10 | Cost: 1.352835e+00
Iteration 11 | Cost: 1.205700e+00
Iteration 12 | Cost: 1.066013e+00
Iteration 13 | Cost: 1.015302e+00
Iteration 14 | Cost: 9.406719e-01
Iteration 15 | Cost: 9.036282e-01
Iteration 16 | Cost: 8.577993e-01
Iteration 17 | Cost: 8.285109e-01
Iteration 18 | Cost: 7.865291e-01
Iteration 19 | Cost: 7.439947e-01
Iteration 20 | Cost: 7.254078e-01
Iteration 21 | Cost: 7.138169e-01
Iteration 22 | Cost: 6.777721e-01
Iteration 23 | Cost: 6.554769e-01
Iteration 24 | Cost: 6.421540e-01
Iteration 25 | Cost: 6.148070e-01
Iteration 26 | Cost: 5.954817e-01
Iteration 27 | Cost: 5.857210e-01
Iteration 28 | Cost: 5.728886e-01
Iteration 29 | Cost: 5.679464e-01
Iteration 30 | Cost: 5.642804e-01
Iteration 31 | Cost: 5.592956e-01
Iteration 32 | Cost: 5.524890e-01
Iteration 33 | Cost: 5.467153e-01
Iteration 34 | Cost: 5.440140e-01
Iteration 35 | Cost: 5.386008e-01
Iteration 36 | Cost: 5.309919e-01
Iteration 37 | Cost: 5.239311e-01
Iteration 38 | Cost: 5.168390e-01
Iteration 39 | Cost: 5.111457e-01
Iteration 40 | Cost: 5.058142e-01
Iteration 41 | Cost: 4.977455e-01
Iteration 42 | Cost: 4.902430e-01
Iteration 43 | Cost: 4.870090e-01
Iteration 44 | Cost: 4.865323e-01
Iteration 45 | Cost: 4.841710e-01
Iteration 46 | Cost: 4.826136e-01
Iteration 47 | Cost: 4.814241e-01
Iteration 48 | Cost: 4.794252e-01
Iteration 49 | Cost: 4.772574e-01
Iteration 50 | Cost: 4.749392e-01
% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));

================= Part 9: Visualize Weights =================

You can now "visualize" what the neural network is learning by
displaying the hidden units to see what features they are capturing in
the data.
fprintf('\nVisualizing Neural Network... \n')
Visualizing Neural Network...
displayData(Theta1(:, 2:end));
对于该图像的理解,就是输入层到隐含层的映射权值关系

================= Part 10: Implement Predict =================

After training the neural network, we would like to use it to predict
the labels. You will now implement the "predict" function to use the
neural network to predict the labels of the training set. This lets
you compute the training set accuracy.
pred = predict(Theta1, Theta2, X);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
Training Set Accuracy: 95.360000