Skip to content

This project contains some data analytics practices using common machine learning models and techniques written in Python.

Notifications You must be signed in to change notification settings

NoSpringNoRain/Data-Analysis-in-Python

Repository files navigation

Data-Analysis-Practices-in-Python

This project contains some big data analysis practices in Python. These practices cover a wide range of techniques and models commonly used in the modern machine learning field.

Covered Topics

Data Preprocessing

Implementation of handling missing data, standardization, normalization, handling non-numerical data, etc.

Clustering and Visualization

Implementation and research of clustering models like KMeans++, Hierarchical, GMM; methods of determine the optimal cluster number like Elbow Method for KMeans++, dentrograms for Hierarchical, Silhouette Coefficient for GMM. Implementation of data visualization using Heatmap, Folium, scatter plot, seaborn, etc. Implementation of image manipulation and depression.

Classification and Dimensionality Reduction

Implementation and research of classification models like Logistic Regression, kNN. Implementation of PCA to reduce data dimensionality.

Social Networks and Recommendation Systems

Usage of Graph or Network to build a recommendation system.

Amazon Movie Star Ranking Prediction

This is an independent project in which I developed a pipeline to predict movie star on real Amazon movie data including movie info and text reviews. In this project, I used TfidfVectorizer to convert text reviews into matrix and then implemented and compared classification models of Ridge Regression, Perceptron, Passive Aggressive, KNN, Random Forest to make the prediction.

Files Description

The practices are separated by topics. In each folder, you may find a pdf file named “Problems” which contains the topic of that folder as well as all the problems to be solved. The coding part of the solutions are included in the same folder while some summary, chart and result can be found in the “Solutions.pdf”. You need to download the dataset from the links in the “Problems.pdf” to run the code.

About

This project contains some data analytics practices using common machine learning models and techniques written in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published