Skip to content

golecalicja/thats-what-she-said

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The Office NLP data analysis

Table of contents

Introduction

As a fan of the popular TV show The Office, I conducted an analysis of the show's script using natural language processing (NLP) techniques. In particular, I used a tool called TF-IDF to identify important words in the script and uncover unique vocabulary for each character.

TF-IDF

TF-IDF stands for term frequency-inverse document frequency and is a tool used to identify important words in a given text. It essentially calculates the frequency of a word in a document and compares it to the frequency of that word across all documents, allowing us to determine the unique words used by each character in the show.

Word Cloud Visualization

To visualize the results of the analysis, I used word clouds for each character. The word clouds display the most important words for each character, and the size of the words reflects their importance in the script.

Results

Here are some word clouds showing the most unique words for each character: andy_merged dwight_merged imgonline-com-ua-twotoone-zblz5EmndQ7r kevin_merged ryan_merged stanley_merged