recommendation-system
Recommendation system
In this post, we visit various approach used in recommendation system. For each approach, i will walk through major points. This post is not in depth explanation, but revision of various approach followed in research and industry.
The content of this post is as follows:
- Content Based
- Collaborative filtering
- Hybrid Approach
- Matrix factorization
- Deep Learning Based recSys
- Graph Based inference model
- Content Based Method
- Based on user history
- Not helpful for cold-start problem
- Feature would be likes of product, location, feature of product we bought etc
- Collaborative filtering Approach:
- Interaction Based feature
- user-user interaction (
I like sitcom tv series, it find the similarity between users and recommend product/movies of users, who have same flavor as me
) - item-item interaction (
Recommend similar product on amazon
)
- user-user interaction (
- can handle cold-start problem very well
KNN
algorithm to measure similarity
- Interaction Based feature
- Hybrid Approach:
- User History
- Interaction of user-user or item-item
- Most company use this approach
- Matrix factorization:
- We need to predict the missing entries in matrix for user-item rating, it can be done using
SVD
asR = U S V'
, whereU
isuser-feature
,S
iseigen-value
andV
isItem-features
- It can be done using
Alterning Optimization approach
with loss function of|r_{ij} - u_i S v_j|^2
- We need to predict the missing entries in matrix for user-item rating, it can be done using
- Probabilistic Matrix Factorization:
- We can include
user
anditem
known history/feature as well to learn betterlatent-representation
. - Loss funtion is
|r_{mn} - u_n v_m|^2 + |u_n - W_u a_n|^2 + |v_m - W_v b_m|^2
, wherea_n
is feature or history ofnth user
andb_m
is the history/feature ofmth item
. - We can even add regularization on
u_n
andv_m
- Work fantastically in
Netflix-movies
- We can include
- Deep Learning Based
- deep and shallow network approach
Deep network
will useword-embedding
ofproduct description
andshallow-network
use theuser-history
as feature ormeta-data
- implemented in
You-tube recSys
- There are many possibile way to build network and use feature, play with
word embedding
- time series content
- time distributed layer for parsing document on item
- tfidf feature
- svd feature
- can even used
doc2vec
for each document.Use gensim doc2vec for training
. **Main idea is that, while training we use the same approach asword2vec
, except thanwe add a document tag with it
, which maintains the context for each doc as well
- Another Deep Learning Approach:
- two network, one for
users
and other foritems
- Compute
user-item
interaction bydot or cosine product
- Build more dense layer to have more complex representation
- use
multiclass cross entropy loss
for5-star rating
- We can also user
sparse implict feedback feature, which are binary in nature
(implicit feedback
which is generated by system onclick-based
andexplicit-feedback
is collected bylikes, review and purchasing history
)
- two network, one for
- Graph Based Network
- use graph embedding to build NN (
deep walk
,random-walk
) - use each feature as node and interaction as edge. For example, for movie recommendation system,
we have 3 user, 5 movies, 8 actors, 5-star rating, 10 genres
we can use each attribute as a node and thenfor each interation, we create an edge
asuser1 like actor1 and given 4-star
- can use
node2vec
, where each node is represented byvector
which is trained on same concept ofrandom-walk
- Current state of art
recSys platform
follows this.
- use graph embedding to build NN (
Some practical insights of Recommender System
- Netflix uses 100s of different base model, final prediction is the weighted average of all. (Generally non-linear blending is preferred)
Weighted Hybrid
: Choose10
items from users rating forcollaborative filtering
as well as fromcontent based filtering
. Now make a list using60%
weighted collaborative and40%
weighted content list. Finallysort
the list.Mix Hybrid
: Take5
items from content,5
from user history,5
from trending and5
from othersSwitching: confidence
: If user is logged in, switch tocollaborative filtering
, otherwise switch tocontent filtering
.