Skip to content

Latest commit

 

History

History
297 lines (213 loc) · 12.6 KB

README.md

File metadata and controls

297 lines (213 loc) · 12.6 KB

DL FX Forecasting

Python project for forecasting changes in several Foreign Exchange (FX) pairs.

Continuous Integration Build Documentation

📖 Table of Contents

Table of Contents
  1. About The Project
  2. Prerequisites
  3. Environment
  4. Data
  5. Visualizations
  6. Modelling
  7. Results
  8. Project Organization
  9. FAQ
  10. References
  11. Author

-----------------------------------------------------

📘 About the project

FX rates forecasting in ultra high frequency setting, using Deep Learning techniques. The main focus of the research is to predict the increments in the next few seconds for a set of different FX pairs.

The project is explained with more detail at documentation.

-----------------------------------------------------

📌 Prerequisites

-----------------------------------------------------

Environment

Execute the following command to start the container:

docker run -it --rm jpxkqx/dl-fx-forecasting:firsttry

In case, the data is already in processed in the host machine, the following command may be more appropriate.

docker run -it -v "/path/to/data:/app/data" --rm jpxkqx/dl-fx-forecasting:firsttry

The path /path/to/data refers to the directory containing the data as presented in the project organization below. In case all processed information is available, it is possible to execute all scripts.

-----------------------------------------------------

🔢 Data

Read, load, preprocess and save the data for the currency pair specified. To go through this pipeline, the ZIP files have to be in the host machine, and the path to the folder containing this data must be specified as an environment variable called PATH_RAW_DATA. The following command process the data available in the host machine for currency pair EUR/USD.

generate_datasets eur usd

In this case, the historical data has been extracted from True FX, whose first prices are shown below.

FX pair Timestamp Low High
EUR/USD 20200401 00:00:00.094 1.10256 1.10269
EUR/USD 20200401 00:00:00.105 1.10257 1.1027
EUR/USD 20200401 00:00:00.193 1.10258 1.1027
EUR/USD 20200401 00:00:00.272 1.10256 1.1027
EUR/USD 20200401 00:00:00.406 1.10258 1.1027
EUR/USD 20200401 00:00:00.415 1.10256 1.1027
EUR/USD 20200401 00:00:00.473 1.10257 1.1027
EUR/USD 20200401 00:00:00.557 1.10255 1.10268

This data is processed by the following command, which computes the mid price and spread and filter some erroneus data points. The processed information is stored using Apache Parquet in order to achieve faster reading times.

-----------------------------------------------------

🎨 Visualizations

Then, plot the currency pair EUR/USD for the period from 25 May, 202 to 30 May, 2020.

plot_currency_pair eur usd mid H T S --period 2020-05-25 2020-05-31

To get the following image,

Mid price plot

There is also the possibility to plot the cumulative distribution function using the following command

plot_cdf eur usd increment --period 2020-04-01 2020-06-01

which gives the image shown below,

EUR/USD

In order to plot the distribution of the main daily statistic of the spread, the following command can be used.

plot_stats eur usd spread D --period 2020-04-01 2020-06-01

EUR/USD

In addition, the correlation between the different currency pairs aggregated by any timeframe can also be plotted for any given period of time.

plot_pair_correlations increment --period 2020-04-01 2020-06-01 --agg_frame H

Correlations

Lastly, the correlation between currency pairs is represented as follows,

plot_pair_acf increment eur usd --agg_frame 'H' --period 2020-04-01 2020-06-01

Correlations

-----------------------------------------------------

Modelling

-----------------------------------------------------

🏆 Results

-----------------------------------------------------

📂 Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A deafult MkDocs project.
|   └── index.md
│
├── models             <- Trained and serialized models, model predictions, or model summaries
|   ├── configurations <- YAML files with model configurations
|   ├── features       <- Contains model selection results, test results and fitted models, under the path
|   |                     models/features/{ model }/{ fx_pair }/{ aux_pair}/{ variables concat with _}
|   |                     In particular, the models used EWMA's of a fixed number of past observations.
│   └── raw            <- Contains model selection results, test results and fitted models, under the path
|                         models/features/{ model }/{ fx_pair }/{ aux_pair}/{ variables concat with _}
|                         In particular, the models used all the past observations.
│
├── notebooks          <- Jupyter notebooks. Containing the results for the training process of diffferent models
|   ├── train...hmtl   <- Output code to include in VC.
│   └── train...ipynb  <- Python notebooks considered. Not included in VC.
|
|
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   ├── figures        <- Generated graphics and figures to be used in reporting, README, and docs
│   ├── images         <- Generated graphics and figures of EDA. Not included in VC.
│   └── models         <- Generated graphics and figures of model results. Not included in VC
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   ├── __init__.py
│   │   ├── data_extract.py
│   │   ├── data_loader.py
│   │   ├── data_preprocess.py
│   │   ├── utils.py
│   │   └── constants.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   ├── __init__.py
│   │   ├── get_blocks.py
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── __init__.py
│   │   ├── neural_network.py
│   │   ├── model_selection.py
│   │   ├── model_utils.py
│   │   └── train_model.py
│   │
│   ├── scripts        <- Scripts to create CLI entrypoints
│   │   ├── __init__.py
│   │   ├── click_utils.py
│   │   ├── generate_datasets.py
│   │   ├── plot_currency_pair.py
│   │   └── plot_pair_correlations.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
|       ├── __init__.py
|       ├── line_plot.py
|       ├── plot_correlations.py
|       ├── plot_results.py
│       └── currency_pair.py
│  
├── tests
│   ├── data           <- Data needed to test the functionalities.
│   ├── mocks.py
│   ├── test_cli_scripts.py
│   ├── test_dataset_generation.py
│   └── test_visualization.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

-----------------------------------------------------

❓ FAQ

-----------------------------------------------------

📚 References

-----------------------------------------------------

👤 Author

👨 Mario Santa Cruz López
BSc in Mathematics at Universidad de Cantabria
MSc in Statistics at Imperial College London
GitHub: @jpxkqx
LinkedIn: @mariosanta-cruz
Software developer at Predictia Intelligent Data Solutions