Skip to content

Latest commit

 

History

History
29 lines (19 loc) · 1.27 KB

README.md

File metadata and controls

29 lines (19 loc) · 1.27 KB

Analysis of a public online retail data set.


The Jupyter notebook stored in this repository is the output of a couple of days of exploratory data analysis of an online retail data set. The code isn't elegant, beautiful, or optimized, it's just what I hacked together in a short time for my own interest.

The purpose of this work was to see what value could be extracted from a fairly large and open-ended dataset (as opposed to one of the more straight forward Kaggle data sets with a more obvious target vector, for example).

The analysis includes some basic feature engineering and machine learning. I also used the opportunity to test Microsoft's LightGBM algorithm - I didn't have access to high performance computing resource and the algorithm served me well in terms of speed and accuracy, compared with XGBoost).

Note: There are a few plots made using Plotly which don't render in the repository view of the notebook. You will have to clone and run it locally to see them. I'll see if I get time to build Altair versions.

ToDo:

  • complete the NLP section to build categories for the free text description of items.
  • build some better features and make some better predictions!