eval-embed

Conversions between embedding formats & evaluations against industry standard datasets.

General Evaluate

python2 evaluate.py -h

This scripts is for various embedding evaluation tasks, such as:

analogy task
linear translation

Known embedding formats

GloVe text and binary with optional bias and context terms
word2vec text and binary

The list can be expanded, you can easily write your custom input format.

For GloVe binary vectors also the corresponding vocabulary should be provided. Floating point precision 32 or 64 bit can be used.

Questions

The script reads questions from stdin and answers them, line by line. The answers are written to stdout, additional debug info to stderr. A question can be any linear combination of input terms, such as:

king -man +woman
frog
chinese + river

Possible evaluation metrics

cos: standard cosine similarity
cos_r: the answers are the same as with cosine similarity, but the true similarity of the outcomes is shown. If you do not care the similarity scores, just the answers, then you can use plain cos because it is slightly faster.
eucl: square of the standard Euclidean metric
eucl_r: the standard Euclidean metric
eucl_norm: Euclidean metric but vectors are normalized first. This should be the same as cos or cos_r
cos_mul: the so called cos-mul metric, used in analogy tasks
cos_mul0: per default, cos-mul operates on (1+cos) since the cos similarity varies from -1 ot 1. But with mul0, positive vectors are assumed.
arccos: arc length distance on the unit sphere
eucl_mul: mutiplicative Euclidean
angle: same as cosine similarity but you can see the actual angle

The list can be expanded, you can easily write your custom metric or similarity.

Translation

Using the translate.py script, you can generate a linear transformation between embeddings. If you have two embeddings and a transformation between them, then you can query in one language and retrieve answers in the other. Lets say you have an English source and a German target embedding and a linear transformation matrix, then:

king - man + woman = Königin

Dependencies

numpy
scipy

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
eval_embed		eval_embed
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analogy.sh		analogy.sh
convert.py		convert.py
evaluate.py		evaluate.py
translate.py		translate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

eval-embed

General Evaluate

Known embedding formats

Questions

Possible evaluation metrics

Translation

Dependencies

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

hlt-bme-hu/eval-embed

Folders and files

Latest commit

History

Repository files navigation

eval-embed

General Evaluate

Known embedding formats

Questions

Possible evaluation metrics

Translation

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages