• Martin Thoma
  • Home
  • Categories
  • Tags
  • Archives
  • Support me

Word Vectors

Contents

  • Requirements
  • Example
  • See also

The idea behind word vectors is to represent natural language words like "king" in $\mathbb{R}^n$ in such a way, that you can do arithmetic. For example,

$$\text{vec}(\text{king}) - \text{vec}(\text{man}) + \text{vec}(\text{woman}) \approx \text{vec}(\text{queen})$$

The Python library gensim implements this.

Requirements

pip install gensim --user
pip install nltk --user

Example

An easy to use example is

from gensim.models import Word2Vec
from nltk.corpus import brown

model = Word2Vec(brown.sents())
model.most_similar(positive=["woman", "king"], negative=["man"], topn=3)

which returns

[
    ("scored", 0.9442702531814575),
    ("calling", 0.9424217939376831),
    ("native", 0.9412217736244202),
]

So the corpus is not that good, but it should work with a bigger one.

See also

  • The amazing power of word vectors
  • Deep learning with word2vec and gensim

Published

Dez 28, 2016
by Martin Thoma

Category

Code

Tags

  • Machine Learning 81
  • Python 141

Contact

  • Martin Thoma - A blog about Code, the Web and Cyberculture
  • E-mail subscription
  • RSS-Feed
  • Privacy/Datenschutzerklärung
  • Impressum
  • Powered by Pelican. Theme: Elegant by Talha Mansoor