You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Machine Cleaning project develops classification models to predict corrected USAC Form 471 line item field values for the most commonly edited line item fields by ESH Business Analysts. Past analyses were done to determine which fields were the the best "low-hanging fruit" for machine learning; these fields are Purpose and Connect Category.
Workflow/Code Architecture Illustration:
These modules were built to facilitate the model building workflow for this specific ESH application, and to make it easier to use for analysts with a limited machine learning background. The preprocessing module is specific to ESH, and the modeling modules are built as wrappers around sklearn modules. These can and should be extended, with the first priority being to rewrite them to work in Python 3+!
Before running this script, go in and edit it for your system. The GITHUB and _FORKED variables need to be edited for your individual system. Then run . ./setup.sh && source mc_venv/bin/activate && . /.bash_profile_mc in your command line to set up and activate the environment.
Includes all data preprocessing functions such as removing nulls, duplicates and data conversions to numeric or dummy variables. Also includes function to remove correlated columns. More detail on ReadtheDocs
Run this notebook to call the load_and_predict() function. This function loads in a model and features and applies it to new data to make predictions on purpose and connect category.
Output: /data/ml_mass_update.csv
Note: Must input the data frame to predict on (after the minimal preprocessing) and a model id (string)