Core Concepts · Module 17·9 min read

Recommender Systems

What should YouTube show you next? What movie should Netflix put on the homepage? Recommendation is its own discipline of machine learning — with techniques that long predate (and still complement) modern LLMs.

Brain Drip EditorsUpdated May 2026·11 references

The five-bullet version

Recommendation is the problem of predicting what a user will want next, given partial information.
Collaborative filtering: use other users’ ratings of overlapping items.
Content-based filtering: use features of the items themselves.
Matrix factorization: factor the sparse user×item matrix into two small “taste” matrices.
Modern systems are hybrid two-tower neural nets: one tower per user, one per item, optimized for retrieval at scale.

§ 00 · WHY RECOMMENDATION IS ITS OWN PROBLEMSparse data, lots of choices

A recommendation system has a particular shape. You have users and items. Some users have rated some items (or watched, or purchased). Most have not. You want to predict, for the missing cells: if this user saw this item, how much would they like it?

The data is wildly sparse. Netflix has ~250 million users and ~17,000 titles. The average user has watched a few hundred. So 99.99% of the user×item matrix is empty. Standard ML doesn’t love this — you’re asked to predict in a regime where nearly all entries are missing.

§ 01 · COLLABORATIVE FILTERINGWisdom of similar tastes

The original recommendation idea is collaborative filteringcollaborative filtering. Predicting a user's rating for an item by looking at how similar users rated that item, or how the same user rated similar items. Doesn't require any features of users or items — only the rating matrix itself.: if Alice and Bob both loved 20 of the same movies, and Alice loves a 21st movie Bob hasn’t seen, Bob will probably love it too.

Two flavors:

User-based. Find users similar to the target user. Recommend what they liked.
Item-based. Find items similar to ones the user liked. Recommend those.

Both work surprisingly well with simple similarity metrics (cosine, Pearson). The downsides: cold start — new users or items have no overlap with anything — and scalability — comparing every user to every other user gets expensive past a few hundred thousand users.

§ 02 · CONTENT-BASED FILTERINGRecommending by features instead of overlap

The complementary idea: don’t look at other users at all. Just recommend things similar tothe user’s past favorites, using features of the items themselves.

For movies: genre, director, year, runtime, embedded plot summary. Compute a user profile as the average of features for their liked movies. Recommend movies with high similarity to that profile.

Content-based filtering handles cold start gracefully (a brand-new movie still has features), but it can’t suggest anything truly new — by construction it’s recommending things like what you’ve already seen. The famous Netflix filter-bubble critique is largely about content-based filtering’s tendency to narrow rather than broaden taste.

§ 03 · MATRIX FACTORIZATIONCompressing taste into a few hidden dimensions

The dominant technique from ~2009 (Netflix Prize era) onward: matrix factorizationmatrix factorization. Approximate the sparse user×item rating matrix R as a product of two low-rank matrices U (users × k) and I (items × k). Each user and each item gets a k-dimensional latent vector; their dot product predicts the rating.. Take the user×item rating matrix R (mostly empty). Find two small matrices U (users × k) and I (items × k) whose product U·Iᵀ approximates R on the observed entries.

Each user gets a k-dimensional latent vector — interpret it as their “taste profile” in some k-axis space the model discovers (one axis might roughly mean “likes action,” another “likes art-house”). Each item gets a matching latent vector. The dot product is the predicted rating.

Lab · user×item matrixObserved ratings · matrix factorization fills in the gaps

InceptionArrivalUpWALL·EJohn WickAmit

—

Beth

—

Carl

—

Dana

—

Eva

—

Observed view: dashes are users who haven’t rated those items. Predicted view: the factorization fills in the gaps by finding two small “taste” matrices whose product approximates the observed cells.

Two beautiful things about this:

The unobserved cells are filled inby the factorization. Trained on the cells you have, the model predicts the cells you don’t.
The latent dimensions emerge from the data. Nobody told the model to think in terms of “action vs art-house”; that structure was implicit in the ratings.

§ 04 · MODERN HYBRIDS AND TWO-TOWER MODELSHow YouTube and Spotify do it now

Production recommendation systems combine multiple signals:

Collaborative (who else liked this?).
Content-based (what features does this share with what I liked?).
Context (time of day, device, what came before).
Real-time behavior (what did I click in the last 30 seconds?).

The dominant modern architecture is the two-tower modeltwo-tower model. A neural recommender with two parallel encoders: one tower that produces an embedding for a user (from history, demographics, context) and one for an item (from its features). Recommendations come from dot product / nearest neighbor in the shared embedding space. Scales to billions of items.: two neural networks running in parallel. One takes the user (history, demographics, context) and produces a vector. The other takes an item (features, metadata) and produces a matching-shape vector. The dot product of the two vectors predicts engagement.

Why two-tower wins at scale: you can precompute the embedding for every item once, store them in a vector index, and at query time just compute the user’s embedding and look up the nearest items. This is the same nearest-neighbor pattern used in modern RAG — and it’s not a coincidence. Recommendation and retrieval converge at scale.

Fig 1Two-tower model. The architecture you'll find inside YouTube, Spotify, TikTok, and most large-scale recommenders. Looks exactly like a retrieval system.

CHECKA music app wants to recommend new songs to a brand-new user who has played 5 songs. Which approach handles this 'cold start' best?

§ 05 · TAKING THIS FORWARDWhere the field is moving

Three threads worth watching:

Sequential / session-based — the next song depends on the last three. Transformers on user history work very well for this and are increasingly the production default.
Multi-objective — engagement is one signal, but so are diversity, novelty, fairness, advertiser revenue. Modern recommenders optimize a multi-objective score, not pure click-through.
LLM-augmented — using an LLM to summarize user intent, generate item descriptions, or directly recommend from natural- language queries. New surface, not a replacement for the underlying ranker.

§ · GOING DEEPERFrom matrix factorization to two-tower neural recommenders

The Netflix Prize (2006–2009) made matrix factorization (Koren et al. 2009) the dominant recommender architecture for a decade — factor the user-item interaction matrix into low-rank latent representations, recommend by dot product. Simple, fast, interpretable. The neural era replaced the dot product with a learned similarity function (Neural Collaborative Filtering, He et al. 2017) but kept the factorization structure.

Modern industrial systems (YouTube, TikTok, Spotify) almost all use a two-towerarchitecture (Covington et al. 2016, Yi et al. 2019): one neural network encodes the user, a separate one encodes the item, recommendation is dot product of their embeddings. Pre-compute item embeddings, do nearest-neighbor search at query time. Sequential variants (SASRec, Kang & McAuley 2018) model the user as a sequence of past interactions — directly inheriting transformer architecture for what was once a matrix-factorization problem.

§ · FURTHER READINGReferences & deeper sources

Koren, Bell, Volinsky (2009). Matrix Factorization Techniques for Recommender Systems · IEEE Computer
He, Liao, Zhang, Nie, Hu, Chua (2017). Neural Collaborative Filtering · WWW
Covington, Adams, Sargin (2016). Deep Neural Networks for YouTube Recommendations · RecSys
Kang, McAuley (2018). Self-Attentive Sequential Recommendation (SASRec) · ICDM
Yi et al. (2019). Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations · RecSys

Original figures live in the linked sources — open the papers for the canonical visuals in their full context.