🔥 This repository can be used to develop training models using tree type regression-based clustering and make prediction using those models. The frame-work is customized for the ignition delay data but with minor changes it works fine with any data having continuous dependent (output) variable. The algorithm uses error-based technique to divide the data into three clusters based on relative error in prediction and sign of prediction error to obtain the accurate regression models. Please look at the manual for more information.
Table of Contents
🔥 OS : Linux
🔥 Python 3.6+
🔥 Clone the repository in suitable directory.
🔥 Open your ./bashrc file and add lines given below at the bottom of file.
Add sourcing to find command:
🔥 Copy the following commands in your ./bashrc file
##Package command Finder:
export IDCODE="${HOME}/PathToDir/.../Data_driven_Kinetics/"
export PATH=$PATH:$IDCODE
alias IDprediction="pwd>${HOME}/PathToDir/.../Data_driven_Kinetics/filelocation.txt && Run.sh"
Replace "/PathToDir/.../" with your directory location.
Example:
If repo is cloned in ./home directory then configure .bashrc using following command:
##Package command Finder:
export IDCODE="${HOME}/Data_driven_Kinetics/"
export PATH=$PATH:$IDCODE
alias IDprediction="pwd>${HOME}/Data_driven_Kinetics/filelocation.txt && Run.sh"
Source the changes:
🔥 (IMPORTANT) To configure the changes in .bashrc, write following command in terminal.
cd
source .bashrc
Install dependency:
🔥 To install all the dependency use INSTALL.sh file. Write the commands given below in the terminal
chmod +x INSTALL.sh
./INSTALL.sh
Make Run.sh file executable:
🔥 To make run file executable, go to ./Data_driven_Kinetics and write following command.
chmod +x Run.sh
All set!
Now, open terminal and type following commands to generate result.
IDprediction -flag file_name.csv
Input arguments to 'IDprediction' are specified as below:
Consider the data file as 'file_name.csv'
🔥 -a : ‘Analyze’ the data-set by selecting certain parameters
IDprediction -a file_name.csv
🔥 -b : Find types of 'bond’ associated with given fuel
IDprediction -b FuelSMILES
IDprediction -b CCC
IDprediction -b CCCCCC
🔥 -h : Generates 'histogram’ plots of parameters for each fuel individually
IDprediction -h file_name.csv
🔥 -c : To define the 'criteria' for error based clustering
🔥 -l : To ‘limit’ number of reference point
🔥 -r : To 'remove’ feature by back-elimination
🔥 -s : To specify significance level
🔥 -m : To find out multiple linear regression of data
IDprediction -c 0.05 -l 10 -r True -s 0.05 -m file_name.csv
🔥 -t : ‘Tree’ type regression based clustering algorithm
IDprediction -c 0.05 -r False -t file_name.csv
🔥 -e : 'External' Dataset used for prediction (Complete above Model generation first)
IDprediction -e test_data.csv
🔥 -k : To run code multiple ‘(k)’ times and store all test prediction result in different directory
IDprediction -k testset.csv
🔥 -f : Probability density ‘function’ plot of testing result after running code 'k' times
IDprediction -f testset.csv
🔥 -p : Plot and obtain of average value of coefficient from coefficient file (If coefficient result obtained many times and there is variation in coefficients)
IDprediction -p coefficient_3.csv
🔥 -o : To run any 'other’ dataset than fuel
IDprediction -c 0.05 -l 10 -o anyFile.csv
Don’t forget to make changes in ’feature selection.py file’
Example:1 Run the following commands to generate models and make predictions using Ignition delay data:
cd TryYourself/nAlkaneIDT/
IDprediction -c 0.1 -t trainset.csv
IDprediction -e testset.csv
Example:2 Run the following commands to generate models and make predictions using Wine quality data:
cd TryYourself/WineQuality/
IDprediction -c 0.1 -o trainset.csv
IDprediction -e testset.csv
Make appropriate changes in ’feature selection.py' file to change features accordingly to the data. (Check manual)