Classifying Retweet Bots

This repository contains most of the code that was used for the CS-398 Projet en informatique I course.

The most important files are tweet_stream.py which samples random users with their last 200 Tweets, feature_creation.py that extracts the features listed in the fields.txt file and finally the cluster.ipynb which was used to study and create clusters.

To use tweet_stream.py:

You will need a Twitter developper account and have the bearer token from the API keys placed inside a file named config.py at the root of the folder. This key is stored in a string variable named BEARER_TOKEN. Example:

BEARER_TOKEN = "ABCD123"

At line 9: change the string variable user to point to a JSONL file where you would want the users to be stored;
At line 538: change the string variable ratios to point to a CSV file where you would want the users' RT ratio to be stored. This file will be a CSV file with the following header: "author_id, rt_ratio, num_tweet";
If you want to extract features, do not forget the date of the streaming.

To use feature_creation.py, you will need some libraries such as profanity-check or profanity-check-alt if the previous produces errors, emoji and jellyfish:

At line 537: change the string variable user to point to a JSONL file that contains the users with all fields corresponding to Twitter's v2 API;
At line 538: change the string variable ratios to point to a CSV file that contains the RT ratio for each user in the same order as the previous file. This file should be a CSV file with the following header: "author_id, rt_ratio, num_tweet";
At line 539: change the variable date that represents the date where the users were streamed. You might need to add an extra day to avoid some errors if they appear.

The file decompress.py decompresses a .jsonl.bz2 file (the list of Tweets of a user). Do not hesitate to change the value of the filepath variable.

The figs folder contains all sorts of images and graphs used in the project.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figs		figs
.gitignore		.gitignore
README.md		README.md
auto.py		auto.py
auto_dm.py		auto_dm.py
checks.ipynb		checks.ipynb
cluster.ipynb		cluster.ipynb
collect_user_data.py		collect_user_data.py
decompress.py		decompress.py
feature_creation.py		feature_creation.py
fields.txt		fields.txt
filter_users.py		filter_users.py
hist.ipynb		hist.ipynb
sample.py		sample.py
sanitize.py		sanitize.py
send_messages.py		send_messages.py
timeline.py		timeline.py
tweet_stream.py		tweet_stream.py
user_example.json		user_example.json
user_test.py		user_test.py
verified_count.py		verified_count.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying Retweet Bots

About

Releases

Packages

Languages

ozan-guven/Classifying-Retweet-bots

Folders and files

Latest commit

History

Repository files navigation

Classifying Retweet Bots

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages