Machine Learning Online Class

Exercise 8 | Anomaly Detection and Collaborative Filtering
Instructions
------------
This file contains code that helps you get started on the
exercise. You will need to complete the following functions:
estimateGaussian.m
selectThreshold.m
cofiCostFunc.m
For this exercise, you will not need to change any code in this file,
or any other files other than those mentioned above.

=============== Part 1: Loading movie ratings dataset ================

You will start by loading the movie ratings dataset to understand the
structure of the data.
fprintf('Loading movie ratings dataset.\n\n');
Loading movie ratings dataset.
% Load data
load ('ex8_movies.mat');
% Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies on
% 943 users
%
% R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a
% rating to movie i
% From the matrix, we can compute statistics like average rating.
fprintf('Average rating for movie 1 (Toy Story): %f / 5\n\n', ...
mean(Y(1, R(1, :))));
Average rating for movie 1 (Toy Story): 3.878319 / 5
% We can "visualize" the ratings matrix by plotting it with imagesc
imagesc(Y);
ylabel('Movies');
xlabel('Users');

============ Part 2: Collaborative Filtering Cost Function ===========

You will now implement the cost function for collaborative filtering.
To help you debug your cost function, we have included set of weights
that we trained on that. Specifically, you should complete the code in
cofiCostFunc.m to return J.
% Load pre-trained weights (X, Theta, num_users, num_movies, num_features)
load ('ex8_movieParams.mat');
% Reduce the data set size so that this runs faster
num_users = 4;
num_movies = 5;
num_features = 3;
% X是电影的特征矩阵,一行一个样本特征向量
% Theta是用户的参数矩阵,一行一个用户参数向量
X = X(1 : num_movies, 1 : num_features);
Theta = Theta(1:num_users, 1:num_features);
Y = Y(1 : num_movies, 1 : num_users);
R = R(1 : num_movies, 1 : num_users);
% Evaluate cost function
J = cofiCostFunc([X(:) ; Theta(:)], Y, R, num_users, num_movies, num_features, 0);
fprintf(['Cost at loaded parameters: %f '...
'\n(this value should be about 22.22)\n'], J);
Cost at loaded parameters: 22.224604
(this value should be about 22.22)

============== Part 3: Collaborative Filtering Gradient ==============

Once your cost function matches up with ours, you should now implement
the collaborative filtering gradient function. Specifically, you should
complete the code in cofiCostFunc.m to return the grad argument.
fprintf('\nChecking Gradients (without regularization) ... \n');
Checking Gradients (without regularization) ...
% Check gradients by running checkNNGradients
checkCostFunction;
2.4734 2.4734
-3.3642 -3.3642
3.0220 3.0220
-0.0045 -0.0045
1.3398 1.3398
-3.0521 -3.0521
2.0962 2.0962
0.0222 0.0222
-5.1495 -5.1495
2.6312 2.6312
-1.3667 -1.3667
0.3324 0.3324
1.4416 1.4416
-0.5971 -0.5971
-1.8176 -1.8176
5.4534 5.4534
-4.0567 -4.0567
-0.4537 -0.4537
-0.2140 -0.2140
0.5173 0.5173
3.0944 3.0944
1.1219 1.1219
0.7806 0.7806
-0.2328 -0.2328
-1.3480 -1.3480
0.7513 0.7513
-2.6773 -2.6773
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your cost function implementation is correct, then
the relative difference will be small (less than 1e-9).

Relative Difference: 7.05038e-13

========= Part 4: Collaborative Filtering Cost Regularization ========

Now, you should implement regularization for the cost function for
collaborative filtering. You can implement it by adding the cost of
regularization to the original cost computation.
% Evaluate cost function
J = cofiCostFunc([X(:) ; Theta(:)], Y, R, num_users, num_movies, ...
num_features, 1.5);
fprintf(['Cost at loaded parameters (lambda = 1.5): %f '...
'\n(this value should be about 31.34)\n'], J);
Cost at loaded parameters (lambda = 1.5): 31.344056
(this value should be about 31.34)

======= Part 5: Collaborative Filtering Gradient Regularization ======

Once your cost matches up with ours, you should proceed to implement
regularization for the gradient.
fprintf('\nChecking Gradients (with regularization) ... \n');
Checking Gradients (with regularization) ...
% Check gradients by running checkNNGradients
checkCostFunction(1.5);
-0.5757 -0.5757
1.7256 1.7256
1.8087 1.8087
5.5917 5.5917
0.1996 0.1996
-7.6902 -7.6902
-4.3051 -4.3051
-10.4999 -10.4999
-0.0717 -0.0717
-3.6842 -3.6842
0.5828 0.5828
-2.0054 -2.0054
-1.7368 -1.7368
-1.9312 -1.9312
-1.4579 -1.4579
-2.4870 -2.4870
2.1670 2.1670
3.8779 3.8779
1.6977 1.6977
7.5505 7.5505
14.9200 14.9200
-1.5953 -1.5953
-0.7142 -0.7142
2.0173 2.0173
1.9999 1.9999
7.5847 7.5847
0.2629 0.2629
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your cost function implementation is correct, then
the relative difference will be small (less than 1e-9).

Relative Difference: 2.01165e-12

============== Part 6: Entering ratings for a new user ===============

Before we will train the collaborative filtering model, we will first
add ratings that correspond to a new user that we just observed. This
part of the code will also allow you to put in your own ratings for the
movies in our dataset!
movieList = loadMovieList();
% Initialize my ratings
my_ratings = zeros(1682, 1);
% Check the file movie_idx.txt for id of each movie in our dataset
% For example, Toy Story (1995) has ID 1, so to rate it "4", you can set
my_ratings(1) = 4;
% Or suppose did not enjoy Silence of the Lambs (1991), you can set
my_ratings(98) = 2;
% We have selected a few movies we liked / did not like and the ratings we
% gave are as follows:
my_ratings(7) = 3;
my_ratings(12)= 5;
my_ratings(54) = 4;
my_ratings(64)= 5;
my_ratings(66)= 3;
my_ratings(69) = 5;
my_ratings(183) = 4;
my_ratings(226) = 5;
my_ratings(355)= 5;
fprintf('\n\nNew user ratings:\n');
New user ratings:
for i = 1 : length(my_ratings)
if my_ratings(i) > 0
fprintf('Rated %d for %s\n', my_ratings(i), ...
movieList{i});
end
end
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)

================== Part 7: Learning Movie Ratings ====================

Now, you will train the collaborative filtering model on a movie rating
dataset of 1682 movies and 943 users
fprintf('\nTraining collaborative filtering...\n');
Training collaborative filtering...
% Load data
load('ex8_movies.mat');
% Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies by
% 943 users
%
% R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a
% rating to movie i
% Add our own ratings to the data matrix
% 相当于新增了一个个用户的数据
Y = [my_ratings, Y];
R = [(my_ratings ~= 0), R];
% Normalize Ratings
[Ynorm, Ymean] = normalizeRatings(Y, R);
% Useful Values
num_users = size(Y, 2);
num_movies = size(Y, 1);
num_features = 10;
% Set Initial Parameters (Theta, X)
% 正态分布的随机数
X = randn(num_movies, num_features);
Theta = randn(num_users, num_features);
initial_parameters = [X(:); Theta(:)];
% Set options for fmincg
options = optimset('GradObj', 'on', 'MaxIter', 100);
% Set Regularization
lambda = 10;
% 开始训练X Theta
theta = fmincg (@(t)(cofiCostFunc(t, Ynorm, R, num_users, num_movies, ...
num_features, lambda)), ...
initial_parameters, options);
Iteration 1 | Cost: 7.493336e+05
Iteration 2 | Cost: 4.880861e+05
Iteration 3 | Cost: 2.987959e+05
Iteration 4 | Cost: 2.322157e+05
Iteration 5 | Cost: 1.770114e+05
Iteration 6 | Cost: 1.465684e+05
Iteration 7 | Cost: 1.268106e+05
Iteration 8 | Cost: 1.162170e+05
Iteration 9 | Cost: 1.080068e+05
Iteration 10 | Cost: 1.027622e+05
Iteration 11 | Cost: 9.724652e+04
Iteration 12 | Cost: 9.367597e+04
Iteration 13 | Cost: 9.234529e+04
Iteration 14 | Cost: 8.890725e+04
Iteration 15 | Cost: 8.677051e+04
Iteration 16 | Cost: 8.556865e+04
Iteration 17 | Cost: 8.443566e+04
Iteration 18 | Cost: 8.131790e+04
Iteration 19 | Cost: 7.945750e+04
Iteration 20 | Cost: 7.858425e+04
Iteration 21 | Cost: 7.774777e+04
Iteration 22 | Cost: 7.665230e+04
Iteration 23 | Cost: 7.550082e+04
Iteration 24 | Cost: 7.492713e+04
Iteration 25 | Cost: 7.432257e+04
Iteration 26 | Cost: 7.351994e+04
Iteration 27 | Cost: 7.301211e+04
Iteration 28 | Cost: 7.271555e+04
Iteration 29 | Cost: 7.236754e+04
Iteration 30 | Cost: 7.217922e+04
Iteration 31 | Cost: 7.191707e+04
Iteration 32 | Cost: 7.169348e+04
Iteration 33 | Cost: 7.145787e+04
Iteration 34 | Cost: 7.119558e+04
Iteration 35 | Cost: 7.092016e+04
Iteration 36 | Cost: 7.070979e+04
Iteration 37 | Cost: 7.046319e+04
Iteration 38 | Cost: 7.020285e+04
Iteration 39 | Cost: 7.010710e+04
Iteration 40 | Cost: 7.005226e+04
Iteration 41 | Cost: 6.999912e+04
Iteration 42 | Cost: 6.991734e+04
Iteration 43 | Cost: 6.985513e+04
Iteration 44 | Cost: 6.981437e+04
Iteration 45 | Cost: 6.976908e+04
Iteration 46 | Cost: 6.960313e+04
Iteration 47 | Cost: 6.945370e+04
Iteration 48 | Cost: 6.935492e+04
Iteration 49 | Cost: 6.928855e+04
Iteration 50 | Cost: 6.920689e+04
Iteration 51 | Cost: 6.917248e+04
Iteration 52 | Cost: 6.915740e+04
Iteration 53 | Cost: 6.914421e+04
Iteration 54 | Cost: 6.912195e+04
Iteration 55 | Cost: 6.910236e+04
Iteration 56 | Cost: 6.909230e+04
Iteration 57 | Cost: 6.908018e+04
Iteration 58 | Cost: 6.906652e+04
Iteration 59 | Cost: 6.904157e+04
Iteration 60 | Cost: 6.901334e+04
Iteration 61 | Cost: 6.895874e+04
Iteration 62 | Cost: 6.889058e+04
Iteration 63 | Cost: 6.885511e+04
Iteration 64 | Cost: 6.884749e+04
Iteration 65 | Cost: 6.882642e+04
Iteration 66 | Cost: 6.882023e+04
Iteration 67 | Cost: 6.881252e+04
Iteration 68 | Cost: 6.879600e+04
Iteration 69 | Cost: 6.876019e+04
Iteration 70 | Cost: 6.871203e+04
Iteration 71 | Cost: 6.868484e+04
Iteration 72 | Cost: 6.867702e+04
Iteration 73 | Cost: 6.866686e+04
Iteration 74 | Cost: 6.866216e+04
Iteration 75 | Cost: 6.865971e+04
Iteration 76 | Cost: 6.865196e+04
Iteration 77 | Cost: 6.864890e+04
Iteration 78 | Cost: 6.864051e+04
Iteration 79 | Cost: 6.862114e+04
Iteration 80 | Cost: 6.861094e+04
Iteration 81 | Cost: 6.860446e+04
Iteration 82 | Cost: 6.859538e+04
Iteration 83 | Cost: 6.858785e+04
Iteration 84 | Cost: 6.857818e+04
Iteration 85 | Cost: 6.857391e+04
Iteration 86 | Cost: 6.857215e+04
Iteration 87 | Cost: 6.856463e+04
Iteration 88 | Cost: 6.856270e+04
Iteration 89 | Cost: 6.856155e+04
Iteration 90 | Cost: 6.855144e+04
Iteration 91 | Cost: 6.853764e+04
Iteration 92 | Cost: 6.853423e+04
Iteration 93 | Cost: 6.853220e+04
Iteration 94 | Cost: 6.852683e+04
Iteration 95 | Cost: 6.852093e+04
Iteration 96 | Cost: 6.851416e+04
Iteration 97 | Cost: 6.851059e+04
Iteration 98 | Cost: 6.850676e+04
Iteration 99 | Cost: 6.850161e+04
Iteration 100 | Cost: 6.849832e+04
% Unfold the returned theta back into U and W
% 将X和Theta恢复正常形式
X = reshape(theta(1 : num_movies * num_features), num_movies, num_features);
Theta = reshape(theta(num_movies * num_features + 1 : end), num_users, num_features);
fprintf('Recommender system learning completed.\n');
Recommender system learning completed.
Program paused. Press enter to continue.

================== Part 8: Recommendation for you ====================

After training the model, you can now make recommendations by computing
the predictions matrix.
% 新用户的预测也是先用整体求出后,加上均值
p = X * Theta';
% 对训练出来的预估结果加上均值
p = 1682×944
1.8800 1.9710 2.0311 1.7991 2.5449 2.0332 1.5129 2.0184 2.2048 2.3918 2.3236 2.2005 2.3358 1.6370 2.0754 1.6219 3.0511 1.5942 1.9533 1.6246 1.5455 1.8629 1.9774 1.5970 2.2110 2.3486 1.8730 1.5669 2.0410 2.0554 2.2106 1.7546 1.5966 1.9052 1.6522 1.8215 2.3256 2.0306 2.7489 2.0571 1.4315 1.7024 2.8677 2.3950 2.2417 2.3529 1.8727 1.7900 2.2600 0.3619
2.1265 2.8987 2.3836 1.7005 2.7510 2.5709 2.0496 2.9104 2.5260 2.6200 2.9841 2.6371 2.9629 3.1121 2.9302 2.0108 3.1642 1.4477 2.5561 2.0562 2.1126 2.2198 2.7754 2.6062 2.9052 2.9431 1.9467 2.0210 2.6974 2.2473 2.5499 2.4426 1.8797 2.1294 2.3299 1.7422 2.7357 2.7374 3.0183 2.0295 1.8609 2.2883 3.1284 2.9603 2.7075 2.7283 2.5027 1.9355 2.4170 1.4942
1.8186 2.5819 2.1273 1.3986 2.3608 2.0329 1.8147 3.4106 2.1723 2.3777 2.9738 2.2252 2.7219 1.5843 2.8756 1.1059 2.8234 1.3025 2.3195 1.7444 1.4135 2.6644 2.7298 2.3901 3.0138 2.4750 1.8301 2.0345 2.9194 2.0083 2.4596 2.3747 2.0886 1.9485 2.2197 1.2044 2.2608 2.6686 2.4327 1.6560 1.4872 2.1677 1.9010 2.2758 2.4549 2.3229 2.2188 1.5414 1.9386 2.2677
2.0119 3.5402 2.5864 1.8126 2.8072 2.9022 2.6404 2.7435 2.8369 2.8003 2.9340 2.5135 2.9832 3.2193 3.0305 1.9217 3.3699 1.9985 2.7009 2.2307 1.3814 2.9140 3.3265 2.8731 3.1274 2.9607 2.0766 1.9331 2.8813 2.4734 2.6789 2.7750 2.0917 1.9143 2.3803 1.3725 2.2197 2.6876 2.1457 2.0886 1.7831 2.6348 2.6607 2.8072 2.6310 2.8137 2.6441 2.2029 2.4756 2.1106
2.1488 2.3553 2.5251 1.7425 2.6336 1.8246 2.2591 3.9344 2.5459 2.7108 3.4279 2.9298 3.2709 1.8357 2.9440 2.2091 3.3497 1.3279 2.7944 2.0336 2.1909 2.4143 2.6516 2.5454 3.1653 2.8720 2.2036 2.2313 2.8320 2.2754 2.7635 2.4654 1.9511 2.1709 2.4404 1.7533 2.9798 2.5200 3.7571 1.8848 1.6775 2.1952 3.0474 3.0225 2.8985 2.6142 2.5107 1.8249 2.5228 1.9512
2.0619 3.4751 2.9961 2.2643 3.1469 2.7330 3.0893 3.3813 2.8929 3.1407 3.6359 3.2790 3.1569 3.7896 3.3472 2.5584 3.5236 1.8547 3.6594 2.4032 1.8350 3.0969 2.7042 3.0070 3.4551 3.0551 2.4784 2.2063 2.7378 2.9660 3.0558 3.1187 2.1583 2.1985 3.1324 1.7295 2.6651 2.6775 2.9977 2.1470 2.1115 2.7076 2.7606 3.0121 2.7129 3.0926 3.0915 2.6033 2.9641 2.9786
1.6657 2.6459 1.9873 1.9188 2.4707 2.8680 1.9268 2.4832 2.1333 2.1065 2.4133 2.0871 2.2479 0.5802 2.8262 0.9290 2.6082 1.7826 2.0835 1.6607 1.1437 2.4002 2.9358 2.6253 2.6076 2.4110 1.6626 1.7349 2.6565 1.9163 2.1565 2.3127 1.9039 1.8114 2.2911 1.2328 2.3224 2.3040 1.2749 1.7844 1.1457 2.3049 1.7691 1.9239 2.7365 2.1852 2.1767 1.8461 1.9710 1.9408
2.0010 3.1714 2.7619 2.2999 2.9306 2.9317 3.1755 3.3238 2.7377 3.0001 3.3567 3.3061 3.1581 2.7744 3.4034 2.7144 3.5201 1.9669 3.2696 2.4304 2.0009 2.7418 2.9982 3.1692 3.0664 3.1680 2.2515 1.9843 2.6730 2.6432 2.7955 3.0216 1.8490 1.9485 2.5520 1.6988 2.9959 2.4501 3.6628 2.2142 1.8234 2.6409 3.3300 3.1738 2.9687 2.8444 2.7893 2.4965 2.8097 2.5107
1.7067 2.7285 2.8337 2.4303 3.2748 2.0083 2.6933 2.9736 2.6250 2.9472 3.2127 3.1259 2.8768 1.1073 2.5050 2.5723 3.1203 2.1043 3.5981 2.2164 1.3637 3.0408 2.2187 2.5297 3.2477 2.6906 2.5131 2.0823 2.3941 2.9486 2.9521 2.7471 2.2276 2.2207 3.2341 1.7430 2.3491 1.9494 3.1846 2.2388 1.7557 2.2287 1.5679 2.9296 2.4132 2.6635 3.1199 2.9416 2.5994 3.0868
1.9989 3.4797 2.9478 2.7461 3.4342 3.4216 3.1736 3.6711 2.6813 3.1564 3.7495 3.5922 3.1162 2.0342 3.9057 2.3245 3.4105 2.1286 3.8180 2.4977 1.9595 3.2685 3.0368 3.5626 3.5359 3.2714 2.4403 2.3366 3.0565 2.9929 3.0942 3.4379 2.3428 2.3801 3.4284 1.8017 3.2573 2.8363 3.3896 2.3915 1.9636 2.9405 2.4843 3.0071 3.2351 3.0328 3.2656 2.9105 2.8680 3.6415
my_predictions = p(:, 1) + Ymean;
movieList = loadMovieList();
[r, ix] = sort(my_predictions, 'descend');
fprintf('\nTop recommendations for you:\n');
Top recommendations for you:
for i = 1 : 10
j = ix(i);
fprintf('Predicting rating %.1f for movie %s\n', my_predictions(j), movieList{j});
end
Predicting rating 4.5 for movie Star Wars (1977)
Predicting rating 4.3 for movie Titanic (1997)
Predicting rating 4.2 for movie Raiders of the Lost Ark (1981)
Predicting rating 4.1 for movie Return of the Jedi (1983)
Predicting rating 4.0 for movie Empire Strikes Back, The (1980)
Predicting rating 4.0 for movie Braveheart (1995)
Predicting rating 3.9 for movie Shawshank Redemption, The (1994)
Predicting rating 3.9 for movie Godfather, The (1972)
Predicting rating 3.8 for movie Schindler's List (1993)
Predicting rating 3.8 for movie Fugitive, The (1993)
fprintf('\n\nOriginal ratings provided:\n');
Original ratings provided:
for i = 1 : length(my_ratings)
if my_ratings(i) > 0
fprintf('Rated %d for %s\n', my_ratings(i), movieList{i});
end
end
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)