Skip to content

Prediction of the average price of the Spanish rail tickets data

Notifications You must be signed in to change notification settings

lajobu/Renfe_pred_avg_price

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prediction of the average price of the Spanish rail tickets data¶

License: MIT

The purpose of this project is to create a Machine Learning model which will be able to predict dthe average price of the spanish railway ticket.

It would be applied different regression modeld and their performance are going to be compared by R square score on the test sample and training sample

Technical details about the project:

📍 Programming language: Python

📍 Library: scikit-learn

📍 Applied algorithm: Decision tree, Bagging, Boosting, Random forest and Xgboost

Data sources:

Some figures:

  • Map of the railway city connections:

alt text

  • Cross validation, boosting model:

alt text

  • Actuals vs predicted, boosting model:

alt text

Results:

  • Application of fine tuning (after the green line):

alt text

Conclusion:

As per the above results table, it seems the Boosting model is the best one, it has the greatest R square score on the test sample, equal to 85.92

However, taking into consideration the R square score on the training sample, the Xgboosting model seems to have the greatest score. This result is according to the conclussion made before, which suggests that in this model there is overfitting problem, or at leats, this model overfitted the data more than other models

Links to notebooks:

1) Before modelling:

2) Modelling: