🌍 The below repository contains the concepts and explanations provided by Dr. Joshua. I tried to collect all the information in one place for quick reference. 🌍
🎥 Click the image above for a video!
Diving into CatBoost. CatBoost converts categorical predictors into continuous predictors instead of using one-hot encoding.
CatBoost has a unique boosting strategy (called Ordered Boosting) that separates the residuals associated with a row of training data from the trees that were built with that row of training data
CatBoost does not use normal Decision Trees. Instead it uses Oblivious Decision Trees (ODTs). These are weaker learners (and boosting is all about weak learners) and very fast from a computation side of things.
Although normal Decision Trees can handle relationships among features just fine, Oblivious Decision Trees do not. However, CatBoost uses Feature Combinations to try to deal with that.
If you have a ton of data, building a tree with it all will take a long time. LightGBM reduces the amount of data used to build each tree using Gradient-based One-Side Sampling (GOSS) to speed things up!
Because small residuals are under-reprsented in training datasets, small residuals are amplified by a weight when calculating Gain.
The more features you have, the longer it takes the train a tree. To reduce the number of features, features not declared as categorical that have relatively little overlap are merged via Exclusive Feature Bundling.
LightGBM builds trees "leaf-wise", which, given restrictions on how big the tree can be, generally results in a more accurate tree. This is a big contrast to CatBoost which intentionally builds weaker trees.
In contrast to both XGBoost and CatBoost, LightGBM has yet another way to deal with categorical features. I'm looking forward to doing a StatQuest video comparison of these three methods soon!
The Right to Explanation - the legal right to be given an explanation for the output of an algorithm. For example, if you are rejected for a loan, you can demand an explanation, and this requires explainable AI.
One step towards explaining machine learning results is calculating Shapley Values.
Joshua naively thought that if he could calculate a Shapley Value for a 1 feature decision tree, he could do it with 2. Nope! However, this motivated creation of SHAP, which are used to explain ML.
Joshua figured out how SHAP values are calculated for trees!!!
A summary of the Main Ideas in SHAP!!!
The Illustrated Word2vec Link
A bunch stuff about RNNs, including a chapter from an Neural Networks and Deep Learning by Aurélien Géron. Link
Recurrent Neural Networks (RNNs)
Long Short-Term Memory (LSTM) networks. Chris Olah @ch402 has a great article on the LSTMs.
A Bidirectional Recurrent Neural Network
DBSCAN (a clustering algorithm) Link Link2
DBSCAN (a clustering algorithm)
Feature Engineering Link
Entropy Link
Shannon's original manuscript describing Entropy Link Link2
Mutual Information
Mixed Models Link
Mixed models visualization Link
A summary of t-SNE, LargeVis and UMAP
SMOTE
Transformers