Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md
test.py		test.py
train.py		train.py
train_cmp2vd.py		train_cmp2vd.py
train_vd.py		train_vd.py

Repository files navigation

Domain Classifier

Author

Qi Hu Email: qihuchn@gmail.com

Introduction

This is a NLP machine learning project
The function of this model is to classify a single sentence to a certain domain, for example "I' like some Thai food" will be classified to the RESTAURANT domain.
Dataset is not included in this project. You can prepare your own data. The format of data should be as follow:
1. training, testing and validating data are stored separately in 3 files. Each file is composed of multiple lines (Each line is a data sample).
2. For each sample of data, there are 3 parts separated by TAB sign ('\t'): domain (label), sentence, word_dictionary_feature (this is a pre-processed feature, representing the high level feature of each word in the sentence)

Algorithm

Use word embedding for each raw input word
Use CNN model to extract feature from the sentence (after embedding layer)
Combine the CNN feature and word_dictionary_feature together and feed into the fully-connected layer
Use Softmax for the final classification
I tried different feature, parameters and network structures (different position of drop layer and fully-connected layer)

Usage

Train: train.py
Test: test.py
If you want to try different parameters, change them in the train.py/test.py scripts. If you want to tried different models (network structure), change them in corresponding script in the folder 'model'.

Files

model (folder): Different CNN models. Each file defines one model, and the differences lie in the features and network structure.
utils (folder): Some tool functions for data pre-processing, data loading, error analysis, raw data analysis
train.py: Train the basic model with no word_dictionary_feature
train_vd.py: Train the model with word_dictionary_feature
train_cmp2vd.py: Train the model with simplified word_dictionary_feature (choose only one most important feature for each word)

About

Sentence classification for dialog management

Report repository

Releases

No releases published

Packages

Contributors

Languages

Python 100.0%