mahout - What's difference between Collaborative Filtering Item-based recommendation and Content-based recommendation -
i puzzled item-based recommendation in 《mahout in action》.there algorithm in book:
for every item u has no preference yet every item j u has preference compute similarity s between , j add u's preference j, weighted s, running average return top items, ranked weighted average
what can calculate similarity between items? if using content, isn't content-based recommendation ?
item-based collaborative filtering
the original item-based recommendation totally based on user-item ranking (e.g., user rated movie 3 stars, or user "likes" video). when compute similarity between items, not supposed know other users' history of ratings. similarity between items computed based on ratings instead of meta data of item content.
let me give example. suppose have access rating data below:
user 1 likes: movie, cooking user 2 likes: movie, biking, hiking user 3 likes: biking, cooking user 4 likes: hiking
suppose want make recommendations user 4.
first create inverted index items, get:
movie: user 1, user 2 cooking: user 1, user 3 biking: user 2, user 3 hiking: user 2, user 4
since binary rating (like or not), can use similarity measure jaccard similarity compute item similarity.
|user1| similarity(movie, cooking) = --------------- = 1/3 |user1,2,3|
in numerator, user1 element movie , cooking both has. in denominator union of movie , cooking has 3 distinct users (user1,2,3). |.|
here denote size of set. know similarity between movie , cooking 1/3 in our case. same thing possible item pairs (i,j)
.
after done similarity computation pairs, say, need make recommendation user 4.
- look @ similarity score of
similarity(hiking, x)
x other tags might have.
if need make recommendation user 3, can aggregate similarity score each items in list. example,
score(movie) = similarity(biking, movie) + similarity(cooking, movie) score(hiking) = similarity(biking, hiking) + similarity(cooking, hiking)
content-based recommendation
the point of content-based have know content of both user , item. construct user-profile , item-profile using content of shared attribute space. example, movie, represent movie stars in , genres (using binary coding example). user profile, can same thing based on users likes movie stars/genres etc. similarity of user , item can computed using e.g., cosine similarity.
here concrete example:
suppose our user-profile (using binary encoding, 0 means not-like, 1 means like), contains user's preference on 5 movie stars , 5 movie genres:
movie stars 0 - 4 movie genres user 1: 0 0 0 1 1 1 1 1 0 0 user 2: 1 1 0 0 0 0 0 0 1 1 user 3: 0 0 0 1 1 1 1 1 1 0
suppose our movie-profile:
movie stars 0 - 4 movie genres movie1: 0 0 0 0 1 1 1 0 0 0 movie2: 1 1 1 0 0 0 0 1 0 1 movie3: 0 0 1 0 1 1 0 1 0 1
to calculate how movie user, use cosine similarity:
dot-product(user1, movie1) similarity(user 1, movie1) = --------------------------------- ||user1|| x ||movie1|| 0x0+0x0+0x0+1x0+1x1+1x1+1x1+1x0+0x0+0x0 = ----------------------------------------- sqrt(5) x sqrt(3) = 3 / (sqrt(5) x sqrt(3)) = 0.77460
similarly:
similarity(user 2, movie2) = 3 / (sqrt(4) x sqrt(5)) = 0.67082 similarity(user 3, movie3) = 3 / (sqrt(6) x sqrt(5)) = 0.54772
if want give 1 recommendation user i
, pick movie j
has highest similarity(i, j)
.
hope helps.
Comments
Post a Comment