Mathematica SVD 推荐系统,循环内分配问题
我正在翻译以下SVD推荐系统,用 ruby 编写,发送给 Mathematica:
require 'linalg'
users = { 1 => "Ben", 2 => "Tom", 3 => "John", 4 => "Fred" }
m = Linalg::DMatrix[
#Ben, Tom, John, Fred
[5,5,0,5], # season 1
[5,0,3,4], # season 2
[3,4,0,3], # season 3
[0,0,5,3], # season 4
[5,4,4,5], # season 5
[5,4,5,5] # season 6
]
# Compute the SVD Decomposition
u, s, vt = m.singular_value_decomposition
vt = vt.transpose
# Take the 2-rank approximation of the Matrix
# - Take first and second columns of u (6x2)
# - Take first and second columns of vt (4x2)
# - Take the first two eigen-values (2x2)
u2 = Linalg::DMatrix.join_columns [u.column(0), u.column(1)]
v2 = Linalg::DMatrix.join_columns [vt.column(0), vt.column(1)]
eig2 = Linalg::DMatrix.columns [s.column(0).to_a.flatten[0,2], s.column(1).to_a.flatten[0,2]]
# Here comes Bob, our new user
bob = Linalg::DMatrix[[5,5,0,0,0,5]]
bobEmbed = bob * u2 * eig2.inverse
# Compute the cosine similarity between Bob and every other User in our 2-D space
user_sim, count = {}, 1
v2.rows.each { |x|
user_sim[count] = (bobEmbed.transpose.dot(x.transpose)) / (x.norm * bobEmbed.norm)
count += 1
}
# Remove all users who fall below the 0.90 cosine similarity cutoff and sort by similarity
similar_users = user_sim.delete_if {|k,sim| sim < 0.9 }.sort {|a,b| b[1] <=> a[1] }
similar_users.each { |u| printf "%s (ID: %d, Similarity: %0.3f) \\n", users[u[0]], u[0], u[1] }
# We'll use a simple strategy in this case:
# 1) Select the most similar user
# 2) Compare all items rated by this user against your own and select items that you have not yet rated
# 3) Return the ratings for items I have not yet seen, but the most similar user has rated
similarUsersItems = m.column(similar_users[0][0]-1).transpose.to_a.flatten
myItems = bob.transpose.to_a.flatten
not_seen_yet = {}
myItems.each_index { |i|
not_seen_yet[i+1] = similarUsersItems[i] if myItems[i] == 0 and similarUsersItems[i] != 0
}
printf "\\n %s recommends: \\n", users[similar_users[0][0]]
not_seen_yet.sort {|a,b| b[1] <=> a[1] }.each { |item|
printf "\\tSeason %d .. I gave it a rating of %d \\n", item[0], item[1]
}
print "We've seen all the same seasons, bugger!" if not_seen_yet.size == 0
这是相应的 Mathematica 代码:
Clear[s, u, v, s2, u2, v2, m, n, testdata, trainingdata, user, user2d];
find1nn[trainingdata_, user_] := {
{u , s, v} = SingularValueDecomposition[Transpose[trainingdata]];
(* Reducr to 2 dimensions. *)
u2 = u[[All, {1, 2}]];
s2 = s[[{1, 2}, {1, 2}]];
v2 = v[[All, {1, 2}]];
user2d = user.u2.Inverse[s2];
{m, n} = Dimensions[v2];
closest = -1;
index = -1;
For[a = 1, a < m, a++,
{distance = 1 - CosineDistance[v2[[a, {1, 2}]], user2d];,
If[distance > closest, {closest = distance, index = a}];}];
closestuserratings = trainingdata[[index]];
closestuserratings
}
rec[closest_, userx_] := {
d = Dimensions[closest];
For[b = 1, b <= d[[2]], b++,
If[userx[[b]] == 0., userx[[b]] = closest[[1, b]]]
]
userx
}
finalrec[td_, user_] := rec[find1nn[td, user], user]
(*Clear[s,u,v,s2,u2,v2,m,n,testdata,trainingdata,user,user2d]*)
testdata = {{5., 5., 3., 0., 5., 5.}, {5., 0., 4., 1., 4., 4.}, {0.,
3., 0., 5., 4., 5.}, {5., 4., 3., 3., 5., 5.}};
bob = {5., 0., 4., 0., 4., 5.};
(*recommend[testdata,bob]*)
find1nn[testdata, bob]
finalrec[testdata, bob]
出于某种原因,它不会在函数内部分配用户的索引,而是在函数外部分配。什么可能导致这种情况发生?
I'm translating the following SVD recommendation system, written in ruby, to Mathematica:
require 'linalg'
users = { 1 => "Ben", 2 => "Tom", 3 => "John", 4 => "Fred" }
m = Linalg::DMatrix[
#Ben, Tom, John, Fred
[5,5,0,5], # season 1
[5,0,3,4], # season 2
[3,4,0,3], # season 3
[0,0,5,3], # season 4
[5,4,4,5], # season 5
[5,4,5,5] # season 6
]
# Compute the SVD Decomposition
u, s, vt = m.singular_value_decomposition
vt = vt.transpose
# Take the 2-rank approximation of the Matrix
# - Take first and second columns of u (6x2)
# - Take first and second columns of vt (4x2)
# - Take the first two eigen-values (2x2)
u2 = Linalg::DMatrix.join_columns [u.column(0), u.column(1)]
v2 = Linalg::DMatrix.join_columns [vt.column(0), vt.column(1)]
eig2 = Linalg::DMatrix.columns [s.column(0).to_a.flatten[0,2], s.column(1).to_a.flatten[0,2]]
# Here comes Bob, our new user
bob = Linalg::DMatrix[[5,5,0,0,0,5]]
bobEmbed = bob * u2 * eig2.inverse
# Compute the cosine similarity between Bob and every other User in our 2-D space
user_sim, count = {}, 1
v2.rows.each { |x|
user_sim[count] = (bobEmbed.transpose.dot(x.transpose)) / (x.norm * bobEmbed.norm)
count += 1
}
# Remove all users who fall below the 0.90 cosine similarity cutoff and sort by similarity
similar_users = user_sim.delete_if {|k,sim| sim < 0.9 }.sort {|a,b| b[1] <=> a[1] }
similar_users.each { |u| printf "%s (ID: %d, Similarity: %0.3f) \\n", users[u[0]], u[0], u[1] }
# We'll use a simple strategy in this case:
# 1) Select the most similar user
# 2) Compare all items rated by this user against your own and select items that you have not yet rated
# 3) Return the ratings for items I have not yet seen, but the most similar user has rated
similarUsersItems = m.column(similar_users[0][0]-1).transpose.to_a.flatten
myItems = bob.transpose.to_a.flatten
not_seen_yet = {}
myItems.each_index { |i|
not_seen_yet[i+1] = similarUsersItems[i] if myItems[i] == 0 and similarUsersItems[i] != 0
}
printf "\\n %s recommends: \\n", users[similar_users[0][0]]
not_seen_yet.sort {|a,b| b[1] <=> a[1] }.each { |item|
printf "\\tSeason %d .. I gave it a rating of %d \\n", item[0], item[1]
}
print "We've seen all the same seasons, bugger!" if not_seen_yet.size == 0
Here is the corresponding Mathematica code:
Clear[s, u, v, s2, u2, v2, m, n, testdata, trainingdata, user, user2d];
find1nn[trainingdata_, user_] := {
{u , s, v} = SingularValueDecomposition[Transpose[trainingdata]];
(* Reducr to 2 dimensions. *)
u2 = u[[All, {1, 2}]];
s2 = s[[{1, 2}, {1, 2}]];
v2 = v[[All, {1, 2}]];
user2d = user.u2.Inverse[s2];
{m, n} = Dimensions[v2];
closest = -1;
index = -1;
For[a = 1, a < m, a++,
{distance = 1 - CosineDistance[v2[[a, {1, 2}]], user2d];,
If[distance > closest, {closest = distance, index = a}];}];
closestuserratings = trainingdata[[index]];
closestuserratings
}
rec[closest_, userx_] := {
d = Dimensions[closest];
For[b = 1, b <= d[[2]], b++,
If[userx[[b]] == 0., userx[[b]] = closest[[1, b]]]
]
userx
}
finalrec[td_, user_] := rec[find1nn[td, user], user]
(*Clear[s,u,v,s2,u2,v2,m,n,testdata,trainingdata,user,user2d]*)
testdata = {{5., 5., 3., 0., 5., 5.}, {5., 0., 4., 1., 4., 4.}, {0.,
3., 0., 5., 4., 5.}, {5., 4., 3., 3., 5., 5.}};
bob = {5., 0., 4., 0., 4., 5.};
(*recommend[testdata,bob]*)
find1nn[testdata, bob]
finalrec[testdata, bob]
For some reason it doesn't assign the indices of the user inside the function, but does outside. What might be causing this to happen?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
请在 Mathematica 文档中查找变量本地化教程。问题出在你的rec函数中。问题是您通常无法修改 Mathematica 中的输入变量(如果您的函数具有 Hold 属性之一,您也许可以这样做,以便将有问题的参数传递给它而不进行评估,但情况并非如此这里):
Please look up variable localization tutorial in Mathematica documentation. The problem is in your rec function. The issue is that you can not normally modify input variables in Mathematica (you may be able to do it if your function has one of the Hold-attributes, so that the parameter in question is passed to it unevaluated, but this is not the case here):
无需尝试了解您想要实现的目标,这里您有一个更 Mathematca 风格的但等效(我希望)的工作代码。
显式循环消失了,许多不需要的变量也被消除了。现在所有变量都是本地变量,因此无需使用 Clear[ ]。
我相信它还可以优化很多。
Without trying to understand what you are trying to achieve, here you have a more Mathematca-ish, but equivalent (I hope) working code.
Explicit Loops are gone, and many unneeded vars eliminated. All variables are now local, so no need to use Clear[ ].
I am sure it can be optimized still a lot.
这是我基于 belisarius 的代码以及 Sjoerd 的改进所做的尝试。
Here is my shot at this based on belisarius' code, and with Sjoerd's improvements.
{{5. Null, 0.Null, 4.Null, 1.Null, 4.Null, 5.Null}}
现在可以工作了,但是为什么是 Null?
{{5. Null, 0. Null, 4. Null, 1. Null, 4. Null, 5. Null}}
now it works but why the Nulls?