Skip to content

snehamariamthomas/movie-review-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis on Movie Reviews

Alt Text

Image Source: The Eagle Way

Summary

This project focused on employing advanced sentiment analysis techniques to classify movie reviews as positive or negative, utilizing user-generated content from platforms such as IMDb. The primary objective was to enhance understanding of audience reception and improve decision-making processes for filmmakers, marketers, and streaming services.

Initially, a Naive Bayes classifier using unigrams was implemented, achieving an accuracy of 79.5% and establishing a baseline for sentiment classification. Subsequently, the analysis progressed to more sophisticated methods, including bigrams and trigrams, and incorporated lasso and ridge regression for regularization. The ridge regression model with trigrams demonstrated superior performance, attaining an accuracy of 80.5%, precision of 77.20%, and recall of 85.12%. This model’s enhanced performance underscores its effectiveness in capturing nuanced sentiment and reducing misclassifications.

These results are significant for practical applications such as predicting box office performance, refining recommendation systems, and tailoring marketing strategies. The improved accuracy and contextual understanding of the ridge regression model offer valuable insights for enhancing sentiment analysis.

Technical Skills and Tools

Aspect Details
Project Focus Natural Language Processing (NLP): Sentiment Analysis
Programming Languages R
Libraries and Packages - Quanteda: Text processing and feature extraction
- Quanteda.textmodels: Implementation of text classification models
- Caret: Model evaluation and performance metrics
- glmnet: Ridge and Lasso regression, regularization
Data Processing - Custom Stopwords List: Tailored stopwords for preserving sentiment-related terms
- Tokenization: Tokenizing text data and removing punctuation, URLs, numbers, and symbols
- Word Stemming: Reducing words to their root forms
Modeling Techniques - Naive Bayes Classifier: Initial NLP classification model using unigrams
- N-grams (Bigrams, Trigrams): Enhanced feature extraction for capturing contextual information
- Ridge Regression: Regularized model to improve classification performance and manage complexity
- Lasso Regression: Regularization technique for feature selection and model simplicity

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages