Word2Vec Misleadings

Distributional semantics is all the rage and is very cool and omg it’s a bit of a fraud!

Just to be clear: there is nothing wrong with the algorithm itself! It is conceptually very interesting and works very well for a lot of cases. Done right, it can give a decent representation of word similarity or meaning. But the “King – Man + Woman = Queen” example by far overstates what the algorithm actually is capable of.

Here are some reasons why I think we should stop using that classical example to introduce Word2Vec:

1) It turns out that for the example to work in the first place, you have to include some ‘cheating’. The actual result would namely be King – Man + Woman = King. So, the resulting vector would be more similar to King than to Queen. The widely known example only works because the implementation of the algorithm will exclude the original vector from the possible results! That means the word vector for King – Man + Woman is closest to the word vector for King. Second comes Queen, which is what the routine will then pick. Quite disappointing, isn’t it?

Why yes, yes it is. (I wonder how close ‘Queen’ is to ‘King’ without the subtraction. Or under random subtractions on ‘King’.)

Gah! This drives me nuts. Please don’t mislead in this way!