You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unofficial implementation of the Efficient Estimation of Word Representations in Vector Space paper written in PyTorch with code for training and demonstration of the properties of the trained model. Emphasis was placed on the Skip-gram Model only.
Content
Files to be familiarized with:
word2vec.pth is a pre-trained model on the Amazon Fashion dataset with a 4000-word vocabulary,
inference.ipynb contains the playground and demonstrates some properties of the model,
train.ipynb trains word2vec from scratch. Use it if you want to customize the training process for yourself,
extra/cloud.svg shows t-SNE visualization of the most distinct word clusters.
Installation
git clone https://github.com/tejpaper/word2vec.git
cd word2vec
pip install -r requirements.txt