Sentiment Analysis on Movie Reviews

Image Source: The Eagle Way

Summary

This project focused on employing advanced sentiment analysis techniques to classify movie reviews as positive or negative, utilizing user-generated content from platforms such as IMDb. The primary objective was to enhance understanding of audience reception and improve decision-making processes for filmmakers, marketers, and streaming services.

Initially, a Naive Bayes classifier using unigrams was implemented, achieving an accuracy of 79.5% and establishing a baseline for sentiment classification. Subsequently, the analysis progressed to more sophisticated methods, including bigrams and trigrams, and incorporated lasso and ridge regression for regularization. The ridge regression model with trigrams demonstrated superior performance, attaining an accuracy of 80.5%, precision of 77.20%, and recall of 85.12%. This model’s enhanced performance underscores its effectiveness in capturing nuanced sentiment and reducing misclassifications.

These results are significant for practical applications such as predicting box office performance, refining recommendation systems, and tailoring marketing strategies. The improved accuracy and contextual understanding of the ridge regression model offer valuable insights for enhancing sentiment analysis.

Technical Skills and Tools

Aspect	Details
Project Focus	Natural Language Processing (NLP): Sentiment Analysis
Programming Languages	R
Libraries and Packages	- Quanteda: Text processing and feature extraction - Quanteda.textmodels: Implementation of text classification models - Caret: Model evaluation and performance metrics - glmnet: Ridge and Lasso regression, regularization
Data Processing	- Custom Stopwords List: Tailored stopwords for preserving sentiment-related terms - Tokenization: Tokenizing text data and removing punctuation, URLs, numbers, and symbols - Word Stemming: Reducing words to their root forms
Modeling Techniques	- Naive Bayes Classifier: Initial NLP classification model using unigrams - N-grams (Bigrams, Trigrams): Enhanced feature extraction for capturing contextual information - Ridge Regression: Regularized model to improve classification performance and manage complexity - Lasso Regression: Regularization technique for feature selection and model simplicity

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Movie_Sentiment_Analysis.Rmd		Movie_Sentiment_Analysis.Rmd
Movie_Sentiment_Analysis.html		Movie_Sentiment_Analysis.html
README.md		README.md
cinema.jpg		cinema.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Movie Reviews

Image Source: The Eagle Way

Summary

Technical Skills and Tools

About

Releases

Packages

Languages

snehamariamthomas/movie-review-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Movie Reviews

Image Source: The Eagle Way

Summary

Technical Skills and Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages