This repository contains code and data for predicting band gaps of materials using machine learning. The project utilizes crystal and atomic features to develop predictive models for material properties.
- BG_Prediction.ipynb: Jupyter notebook containing the code for training and evaluating the band gap prediction model.
- Crystal_Features__Atomic_Features.ipynb: Jupyter notebook for feature extraction from crystal structures and atomic properties.
- dataset.csv: The dataset used for training and testing the machine learning models. This file contains the relevant features and target variable (band gap).
Ensure you have the following installed:
- Python 3.7+
- Jupyter Notebook
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
You can install the necessary packages using pip:
pip install pandas numpy scikit-learn matplotlib seaborn
- Feature Extraction: Open the Crystal_Features__Atomic_Features.ipynb notebook to extract features from crystal structures and atomic properties. This notebook preprocesses the raw data and generates the feature set required for training the model.
- Model Training and Evaluation: Open the BG_Prediction.ipynb notebook to train and evaluate the machine learning model for band gap prediction. This notebook includes data loading, preprocessing, model training, evaluation, and visualization of results.
The dataset.csv file contains the dataset used for this project. Each row represents a material with its crystal and atomic features, along with the target variable (band gap).
To run the project, follow these steps:
- Clone the repository:git clone https://github.com/Deepayanbasu007/ML-Band-Gap-Prediction.git cd ML-Band-Gap-Prediction
- Open the Jupyter notebooks and run them step-by-step:
- Run Crystal_Features__Atomic_Features.ipynb to generate features.
- Run BG_Prediction.ipynb to train and evaluate the model.
Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for any bugs or feature requests.
For any questions or suggestions, feel free to reach out:
- Ph: 91-8100537113
- Email: deepayanbasu5@gmail.com
- Alt_email : basu.3@iitj.ac.in
The final results of the band gap prediction model are as follows:
- Model Used: [e.g., Random Forest, XGBoost, etc.]
- Training Data Size: [number of samples]
- Test Data Size: [number of samples]
- Evaluation Metrics:
-
Random Forest MAE: 0.0022922195389678945
-
Random Forest R2: 0.9999100159046438
-
XGBoost MAE: 0.007091375303173743
-
XGBoost R2: 0.9998081513297306
-
The model demonstrates the following performance on the test set:
- Mean Absolute Error (MAE): The average absolute difference between the predicted and actual band gap values is [value].
- Mean Squared Error (MSE): The average squared difference between the predicted and actual band gap values is [value].
- R² Score: The proportion of variance in the band gap that is predictable from the features is [value].
Below are some visualizations of the model's performance:
-
Actual vs. Predicted Band Gaps:
-
Feature Importance:
This plot shows that almost half of our structures are metals (zero bandgap). The bandgaps around 7 eV could be outliers, but we can deal with those in a later.
- Feature Importance: The most significant features for predicting band gaps are [list of important features]. This indicates that [brief explanation of why these features might be important].
- Model Accuracy: The model achieves a good accuracy with an R² score of [value], suggesting that the features used are effective in predicting the band gap.
- Potential Improvements: