-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from umstek/umstek-readme-1
Update Readme.md
- Loading branch information
Showing
1 changed file
with
20 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,23 @@ | ||
# DengAI | ||
|
||
Open `.ipynb` files with jupyter notebook or alternative. | ||
Run `Preprocess.ipynb` to preprocess source files and generate learn-ready files. | ||
In jupyterlab, `Run -> Run All` will do this. | ||
You can tweak it and make changes and try learning with the resulting files in the `generated` folder. | ||
`ModelSelection.ipynb` is supposed to select the best model to use via a Grid Search Cross Validation (hyperparameter optimization) per each model. But it looks like sklearn is suboptimal (or we don't know how to use it). | ||
`DengAI.ipynb` is supposed to contain the feature selection, learning and result generation but it has not yet been completed. | ||
|
||
You can use other tools to make predictions. | ||
Results: | ||
Matlab Ensemble Boosted Trees with 5-Fold Cross Validation: Error=24.9663 | ||
Settings: Iq -> 7 100 0.09, Sj -> 7 100 0.1 | ||
|
||
Please do not push any **changes** (on master) to these files unless the changes reduce the error. | ||
When you are pushing a notebook, please clear all outputs. e.g.: `Edit -> Clear All Outputs`. | ||
## Reports and Presentations | ||
### [Presentation](https://github.com/umstek/DengAI/blob/master/DengAI.pdf) for CS4622 (Machine Learning) | ||
|
||
### [Report](https://github.com/umstek/DengAI/blob/master/Machine%20Learning%20Report%20-%20Group%2030.pdf) for CS4622 (Machine Learning) | ||
|
||
### [Report](https://github.com/umstek/DengAI/blob/master/Data%20Mining%20Report%20-%20Group%2030.pdf) for CS4642 (Data Mining and Information Retrieval) | ||
|
||
|
||
## Results | ||
Current best result: 19.3798 (MAE), Rank 89 as of July 27 - 2018. | ||
|
||
|
||
## Directory contents | ||
+ The `.` root directory contains the data files downloaded from _drivendata_ and some milestone submissions. | ||
+ `deprecated` folder contains the first approaches to the problem with _Matlab regression learner_ and _Orange3_ (with minimal preprocessing) and the resulting `.csv` files. | ||
+ `Neural Networks` folder contains the first approaches to the problem with deep neural networks with _Keras_ and _Tensorflow_. | ||
+ `Negative Binominal Regression` contains the DengAI benchmark model built with _Jupyter Notebook_ and _sklearn_, _statsmodels_ etc. | ||
+ `Interactive Python 1` contains the approaches that do general preprocessing with _Jupyter Notebook_, _pandas_, _sklearn_, _statsmodels_, _seaborn_ and uses various models for prediction. | ||
+ `Interactive Python 2` contains a pipeline that processes the files in various stages using _Jupyter Notebook_, _pandas_, _sklearn_, _statsmodels_, _seaborn_, and _R_'s STL (time series decomposition) borrowed with the _r2py_ bridge. This pipeline does preprocessing, visualization, analysing, automatic selection of features, best model selection etc. The best working model is a time series decomposing predicter with a linear regression model. | ||
+ `Orange` folder contains an Orange3 pipeline that tests cross-validated errors of various learners with preprocessing, feature engineering etc. | ||
|