Skip to content

zzhangusf/NLP-projects

Repository files navigation

NLP Projects

Project Description

  • TFIDF for Reuters Articles in XML

    • Extracted titles and paragraphs from Reuters articles in the XML format using ElementTree
    • Tokenized and stemmed texts with NLTK, and determined TFIDF of the most common words using TfidfVectorizer
  • Sentiment Analysis with Naive Bayes using PySpark

    • Performed data cleaning and transformation, and estimate TF using PySpark RDD
    • Built a Naive Bayes model to perform sentiment analysis and achieved an accuracy of 82.5%
  • Sentiment Analysis of Tweets

    • Parsed and stemmed tweet texts, and determined TF of the most common words
    • Classified the tweet sentiment using the regularized logistic regression, LDA, and KNN

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors