In the image above, if you want “b” to be more similar to "a" than
"b" is to “c”, which measure should you pick?

Dot product

Correct! The dot product is proportional to both the cosine and the
lengths of vectors. So even though the cosine is higher for “b” and “c”,
the higher length of “a” makes "a" and "b" more similar than "b" and "c".

Cosine

The cosine depends only on the angle between vectors, and the smaller
angle \(\theta_{bc}\) makes \(\cos(\theta_{bc})\) larger than
\(\cos(\theta_{ab})\).

Euclidean distance

The distance \(\vec{bc}\) is smaller than \(\vec{ab}\) making “b”
more similar to “c” than to “a”.

You are calculating similarity for music videos. The length of the
embedding vectors of music videos is proportional to their popularity. You
now choose dot product instead of cosine to calculate similarity. How does
similarity between music videos change?

Popular videos become

**more similar**to all videos in general.Since the dot product is affected by the lengths of both vectors, the
large vector length of popular videos will make them more similar to all
videos.

Popular videos become

**more similar**only to other popular videos.Recall that the dot product is calculated as \(|a||b|\cos(\theta)\).
Assuming "a" is a popular music video, We know its
embedding length, \(|a|\), is larger than that of unpopular videos.
The larger length increases the
dot product irrespective of the value of \(|b|\). Hence, popular videos
become more similar to all other videos, not just other popular videos.

Popular videos become

**less similar**than less popular videos.Since dot product increases with vector length, and popular videos have
high vector length, the similarity measure will increase, not decrease.

No change.

Dot product is affected by vector length and the high vector length of
popular videos will change the similarity measure.

In the same scenario as the previous question, suppose you switch to
cosine from dot product. How does similarity between music videos change?

Popular videos become

**less similar**than less popular videos.Because cosine is not affected by vector length, the large vector
length of embeddings of popular videos does not contribute to similarity.
Thus, switching to cosine from dot product reduces the similarity for
popular videos.

Popular videos become

**more similar**than less popular videos.Cosine is not affected by the vector lengths, so switching from dot
product will cause the similarities for all popular videos to decrease.

No change.

Since cosine is not affected by vector length, using
cosine will result in different similarities.