Naive Bayes and Logistic Regression Text Classification

Input

There are train and test datasets. Input format is following: id,text,label

Dataset is obtained from SuDer Turkish News Collections.

Preprocessing

Lowercase conversion
Category --> Integer
Tokenize

TFIDFVectorizer

Term Frequency - Inverse Document Frequency is a type of word representation according to word frequency and document frequency. It converts words to numerical vectors. Each vector represents a word. Therefore we can obtain a vector space that represents words. For more information, click here. Also package is accessible here.

GridSearchCV

GridSearchCV finds the best combination of given parameters. It is used for both Naive Bayes and Logistic Regression. For more information, you can click here.

Results

Results are measured through test data. Naive Bayes has an accuracy of 0.702 and logistic regression has an accuracy of 0.824.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
dataset		dataset
LICENSE		LICENSE
README.md		README.md
project03_NBLG.ipynb		project03_NBLG.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Naive Bayes and Logistic Regression Text Classification

Input

Preprocessing

TFIDFVectorizer

GridSearchCV

Results

About

Releases

Packages

Languages

License

sdakansu/Naive-Bayes-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Naive Bayes and Logistic Regression Text Classification

Input

Preprocessing

TFIDFVectorizer

GridSearchCV

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages