Word Vectors

The idea behind word vectors is to represent natural language words like "king" in $\mathbb{R}^n$ in such a way, that you can do arithmetic. For example,

$$\text{vec}(\text{king}) - \text{vec}(\text{man}) + \text{vec}(\text{woman}) \approx \text{vec}(\text{queen})$$

The Python library gensim implements this.

Requirements ¶

pip install gensim --user
pip install nltk --user

Example ¶

An easy to use example is

from gensim.models import Word2Vec
from nltk.corpus import brown

model = Word2Vec(brown.sents())
model.most_similar(positive=["woman", "king"], negative=["man"], topn=3)

which returns

[
    ("scored", 0.9442702531814575),
    ("calling", 0.9424217939376831),
    ("native", 0.9412217736244202),
]

So the corpus is not that good, but it should work with a bigger one.

Word Vectors

Requirements ¶

Example ¶

See also ¶

Published

Category

Tags

Contact