Machine Learning Online Class - Exercise 4 Neural Network Learning

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  linear exercise. You will need to complete the following functions 
%  in this exericse:
%
%     sigmoidGradient.m
%     randInitializeWeights.m
%     nnCostFunction.m
%
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%

Initialization

clear ; close all; clc;

Setup the parameters you will use for this exercise

input_layer_size  = 400;  % 20x20 Input Images of Digits
hidden_layer_size = 25;   % 25 hidden units
num_labels = 10;          % 10 labels, from 1 to 10   
                          % (note that we have mapped "0" to label 10)

=========== Part 1: Loading and Visualizing Data =============

We start the exercise by first loading and visualizing the dataset.

You will be working with a dataset that contains handwritten digits.

% Load Training Data

fprintf('Loading and Visualizing Data ...\n')

Loading and Visualizing Data ...

load('ex4data1.mat');

m = size(X, 1);

% Randomly select 100 data points to display

sel = randperm(size(X, 1));

sel = sel(1:100);

displayData(X(sel, :));

================ Part 2: Loading Parameters ================

In this part of the exercise, we load some pre-initialized neural network parameters.

fprintf('\nLoading Saved Neural Network Parameters ...\n')
Loading Saved Neural Network Parameters ...
% Load the weights into variables Theta1 and Theta2
load('ex4weights.mat');
% Unroll parameters 
nn_params = [Theta1(:) ; Theta2(:)];

================ Part 3: Compute Cost (Feedforward) ================

To the neural network, you should first start by implementing the

feedforward part of the neural network that returns the cost only. You

should complete the code in nnCostFunction.m to return cost. After

implementing the feedforward to compute the cost, you can verify that

your implementation is correct by verifying that you get the same cost

as us for the fixed debugging parameters.

We suggest implementing the feedforward cost *without* regularization

first so that it will be easier for you to debug. Later, in part 4, you

will get to implement the regularized cost.

fprintf('\nFeedforward Using Neural Network ...\n')
Feedforward Using Neural Network ...
% Weight regularization parameter (we set this to 0 here).
lambda = 0;
[J, ~] = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ...
                   num_labels, X, y, lambda);
fprintf(['Cost at parameters (loaded from ex4weights): %f '...
         '\n(this value should be about 0.287629)\n'], J);
Cost at parameters (loaded from ex4weights): 0.287629 
(this value should be about 0.287629)

=============== Part 4: Implement Regularization ===============

Once your cost function implementation is correct, you should now

continue to implement the regularization with the cost.

fprintf('\nChecking Cost Function (w/ Regularization) ... \n')
Checking Cost Function (w/ Regularization) ... 
% Weight regularization parameter (we set this to 1 here).
lambda = 1;
J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ...
                   num_labels, X, y, lambda);
fprintf(['Cost at parameters (loaded from ex4weights): %f '...
         '\n(this value should be about 0.383770)\n'], J);
Cost at parameters (loaded from ex4weights): 0.383770 
(this value should be about 0.383770)

================ Part 5: Sigmoid Gradient ================

Before you start implementing the neural network, you will first

implement the gradient for the sigmoid function. You should complete the

code in the sigmoidGradient.m file.

fprintf('\nEvaluating sigmoid gradient...\n')
Evaluating sigmoid gradient...
g = sigmoidGradient([-1 -0.5 0 0.5 1]);
fprintf('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:\n  ');
Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:
  
fprintf('%f ', g);
0.196612 0.235004 0.250000 0.235004 0.196612 

================ Part 6: Initializing Pameters ================

In this part of the exercise, you will be starting to implment a two

layer neural network that classifies digits. You will start by

implementing a function to initialize the weights of the neural network

(randInitializeWeights.m)

fprintf('\nInitializing Neural Network Parameters ...\n')
Initializing Neural Network Parameters ...
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);
% Unroll parameters
initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];

这里选择 eps =

=============== Part 7: Implement Backpropagation ===============

Once your cost matches up with ours, you should proceed to implement the

backpropagation algorithm for the neural network. You should add to the

code you've written in nnCostFunction.m to return the partial

derivatives of the parameters.

fprintf('\nChecking Backpropagation... \n');
Checking Backpropagation... 
%  Check gradients by running checkNNGradients
checkNNGradients;
   -0.0093   -0.0093
    0.0089    0.0089
   -0.0084   -0.0084
    0.0076    0.0076
   -0.0067   -0.0067
   -0.0000   -0.0000
    0.0000    0.0000
   -0.0000   -0.0000
    0.0000    0.0000
   -0.0000   -0.0000
   -0.0002   -0.0002
    0.0002    0.0002
   -0.0003   -0.0003
    0.0003    0.0003
   -0.0004   -0.0004
   -0.0001   -0.0001
    0.0001    0.0001
   -0.0001   -0.0001
    0.0002    0.0002
   -0.0002   -0.0002
    0.3145    0.3145
    0.1111    0.1111
    0.0974    0.0974
    0.1641    0.1641
    0.0576    0.0576
    0.0505    0.0505
    0.1646    0.1646
    0.0578    0.0578
    0.0508    0.0508
    0.1583    0.1583
    0.0559    0.0559
    0.0492    0.0492
    0.1511    0.1511
    0.0537    0.0537
    0.0471    0.0471
    0.1496    0.1496
    0.0532    0.0532
    0.0466    0.0466
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then 
the relative difference will be small (less than 1e-9). 

Relative Difference: 2.47239e-11

=============== Part 8: Implement Regularization ===============

Once your backpropagation implementation is correct, you should now

continue to implement the regularization with the cost and gradient.

fprintf('\nChecking Backpropagation (w/ Regularization) ... \n')
Checking Backpropagation (w/ Regularization) ... 
%  Check gradients by running checkNNGradients
lambda = 3;
checkNNGradients(lambda);
   -0.0093   -0.0093
    0.0089    0.0089
   -0.0084   -0.0084
    0.0076    0.0076
   -0.0067   -0.0067
   -0.0168   -0.0168
    0.0394    0.0394
    0.0593    0.0593
    0.0248    0.0248
   -0.0327   -0.0327
   -0.0602   -0.0602
   -0.0320   -0.0320
    0.0249    0.0249
    0.0598    0.0598
    0.0386    0.0386
   -0.0174   -0.0174
   -0.0576   -0.0576
   -0.0452   -0.0452
    0.0091    0.0091
    0.0546    0.0546
    0.3145    0.3145
    0.1111    0.1111
    0.0974    0.0974
    0.1187    0.1187
    0.0000    0.0000
    0.0337    0.0337
    0.2040    0.2040
    0.1171    0.1171
    0.0755    0.0755
    0.1257    0.1257
   -0.0041   -0.0041
    0.0170    0.0170
    0.1763    0.1763
    0.1131    0.1131
    0.0862    0.0862
    0.1323    0.1323
   -0.0045   -0.0045
    0.0015    0.0015
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then 
the relative difference will be small (less than 1e-9). 

Relative Difference: 2.40444e-11

% Also output the costFunction debugging values
[debug_J, ~]  = nnCostFunction(nn_params, input_layer_size, ...
                          hidden_layer_size, num_labels, X, y, lambda);
fprintf(['\n\nCost at (fixed) debugging parameters (w/ lambda = %f): %f ' ...
         '\n(for lambda = 3, this value should be about 0.576051)\n\n'], lambda, debug_J);
Cost at (fixed) debugging parameters (w/ lambda = 3.000000): 0.576051 
(for lambda = 3, this value should be about 0.576051)

=================== Part 8: Training NN ===================

You have now implemented all the code necessary to train a neural

network. To train your neural network, we will now use "fmincg", which

is a function which works similarly to "fminunc". Recall that these

advanced optimizers are able to train our cost functions efficiently as

long as we provide them with the gradient computations.

fprintf('\nTraining Neural Network... \n')
Training Neural Network... 
%  After you have completed the assignment, change the MaxIter to a larger
%  value to see how more training helps.
options = optimset('MaxIter', 50);
%  You should also try different values of lambda
lambda = 1;
% Create "short hand" for the cost function to be minimized
costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, X, y, lambda);
% Now, costFunction is a function that takes in only one argument (the
% neural network parameters)
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);
Iteration     1 | Cost: 3.422939e+00
Iteration     2 | Cost: 3.237755e+00
Iteration     3 | Cost: 3.126608e+00
Iteration     4 | Cost: 2.967579e+00
Iteration     5 | Cost: 2.672590e+00
Iteration     6 | Cost: 2.311362e+00
Iteration     7 | Cost: 2.145287e+00
Iteration     8 | Cost: 1.796259e+00
Iteration     9 | Cost: 1.495754e+00
Iteration    10 | Cost: 1.352835e+00
Iteration    11 | Cost: 1.205700e+00
Iteration    12 | Cost: 1.066013e+00
Iteration    13 | Cost: 1.015302e+00
Iteration    14 | Cost: 9.406719e-01
Iteration    15 | Cost: 9.036282e-01
Iteration    16 | Cost: 8.577993e-01
Iteration    17 | Cost: 8.285109e-01
Iteration    18 | Cost: 7.865291e-01
Iteration    19 | Cost: 7.439947e-01
Iteration    20 | Cost: 7.254078e-01
Iteration    21 | Cost: 7.138169e-01
Iteration    22 | Cost: 6.777721e-01
Iteration    23 | Cost: 6.554769e-01
Iteration    24 | Cost: 6.421540e-01
Iteration    25 | Cost: 6.148070e-01
Iteration    26 | Cost: 5.954817e-01
Iteration    27 | Cost: 5.857210e-01
Iteration    28 | Cost: 5.728886e-01
Iteration    29 | Cost: 5.679464e-01
Iteration    30 | Cost: 5.642804e-01
Iteration    31 | Cost: 5.592956e-01
Iteration    32 | Cost: 5.524890e-01
Iteration    33 | Cost: 5.467153e-01
Iteration    34 | Cost: 5.440140e-01
Iteration    35 | Cost: 5.386008e-01
Iteration    36 | Cost: 5.309919e-01
Iteration    37 | Cost: 5.239311e-01
Iteration    38 | Cost: 5.168390e-01
Iteration    39 | Cost: 5.111457e-01
Iteration    40 | Cost: 5.058142e-01
Iteration    41 | Cost: 4.977455e-01
Iteration    42 | Cost: 4.902430e-01
Iteration    43 | Cost: 4.870090e-01
Iteration    44 | Cost: 4.865323e-01
Iteration    45 | Cost: 4.841710e-01
Iteration    46 | Cost: 4.826136e-01
Iteration    47 | Cost: 4.814241e-01
Iteration    48 | Cost: 4.794252e-01
Iteration    49 | Cost: 4.772574e-01
Iteration    50 | Cost: 4.749392e-01

% Obtain Theta1 and Theta2 back from nn_params
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

================= Part 9: Visualize Weights =================

You can now "visualize" what the neural network is learning by

displaying the hidden units to see what features they are capturing in

the data.

fprintf('\nVisualizing Neural Network... \n')

Visualizing Neural Network... 

displayData(Theta1(:, 2:end));

对于该图像的理解，就是输入层到隐含层的映射权值关系

================= Part 10: Implement Predict =================

After training the neural network, we would like to use it to predict

the labels. You will now implement the "predict" function to use the

neural network to predict the labels of the training set. This lets

you compute the training set accuracy.

pred = predict(Theta1, Theta2, X);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
Training Set Accuracy: 95.360000