Data-Analysis-Practices-in-Python

This project contains some big data analysis practices in Python. These practices cover a wide range of techniques and models commonly used in the modern machine learning field.

Covered Topics

Data Preprocessing

Implementation of handling missing data, standardization, normalization, handling non-numerical data, etc.

Clustering and Visualization

Implementation and research of clustering models like KMeans++, Hierarchical, GMM; methods of determine the optimal cluster number like Elbow Method for KMeans++, dentrograms for Hierarchical, Silhouette Coefficient for GMM. Implementation of data visualization using Heatmap, Folium, scatter plot, seaborn, etc. Implementation of image manipulation and depression.

Classification and Dimensionality Reduction

Implementation and research of classification models like Logistic Regression, kNN. Implementation of PCA to reduce data dimensionality.

Social Networks and Recommendation Systems

Usage of Graph or Network to build a recommendation system.

Amazon Movie Star Ranking Prediction

This is an independent project in which I developed a pipeline to predict movie star on real Amazon movie data including movie info and text reviews. In this project, I used TfidfVectorizer to convert text reviews into matrix and then implemented and compared classification models of Ridge Regression, Perceptron, Passive Aggressive, KNN, Random Forest to make the prediction.

Files Description

The practices are separated by topics. In each folder, you may find a pdf file named “Problems” which contains the topic of that folder as well as all the problems to be solved. The coding part of the solutions are included in the same folder while some summary, chart and result can be found in the “Solutions.pdf”. You need to download the dataset from the links in the “Problems.pdf” to run the code.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
0-Data_Preprocessing		0-Data_Preprocessing
1-Clustering_and_Visualization		1-Clustering_and_Visualization
2-Classification_and_Dimensionality_Reduction		2-Classification_and_Dimensionality_Reduction
3-Social_Networks_and_Recommendation_Systems		3-Social_Networks_and_Recommendation_Systems
4-Amazon_Movie_Star_Rating_Prediction		4-Amazon_Movie_Star_Rating_Prediction
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Analysis-Practices-in-Python

Covered Topics

Data Preprocessing

Clustering and Visualization

Classification and Dimensionality Reduction

Social Networks and Recommendation Systems

Amazon Movie Star Ranking Prediction

Files Description

About

Releases

Packages

Languages

NoSpringNoRain/Data-Analysis-in-Python

Folders and files

Latest commit

History

Repository files navigation

Data-Analysis-Practices-in-Python

Covered Topics

Data Preprocessing

Clustering and Visualization

Classification and Dimensionality Reduction

Social Networks and Recommendation Systems

Amazon Movie Star Ranking Prediction

Files Description

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages