Gary's Notebook

How Exactly Does Masking in Transformer Work

Masking is one of those concepts that is easy to wave your hands at but quite important if you want to implement the Transformer from…





BERT: A Detailed Guide to Clear Up All Your Confusions

Pre-requisites Understanding of Transformer, especially how masking works. (I strongly recommend Jay Alammar's article The Illustrated…





What does __getitems__ mean? Functions with Double Underscores in Python

While reading the PyTorch source code, I started to notice that many classes have functions named __xxx__ such as __init__ , __len__, or…





Byte-pair Encoding Algorithm

BPE (byte-pair encoding) is a good way (alternative to schemes such as one-hot encoding and Word2Vec pre-trained embeddings) to encode words…





Masking in the Transformer Explained

The Transformer is a landmark breakthrough in NLP that is explained quite well by Jay Alammar's article The Illustrated Transformer. This…





PyTorch scatter_ Function Explained

In PyTorch, is a function you can use to write the values in tensor into the tensor. The best way I found to think about this function is…





Label Smoothing Explained

Label smoothing is a very straightforward regularization technique which is explained extremely well on this page. The basic idea is that…





Re Tutorial: A Quick Python Start

Re, or Regex, stands for regular expression, which means "a sequence of characters that define a search pattern."1 It is particularly useful…





ML Paper Notes: Progressive Neural Networks

Title: Progressive Neural Networks (2016) Rusu et al. Main Ideas: The novel progressive network proposed in the paper is a more…





ML Paper Notes: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Title: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks (2016) Bengio et al. Main Ideas: For language generation…





ML Paper Notes: Matching Networks for One-Shot Learning

Title: Matching Networks for One-Shot Learning (2017), Vinyals et al. Main Ideas: One-shot learning means learning to classify a class from…





ML Paper Notes: Generative Adversarial Nets

Title: Generative Adversarial Nets (2014), Goodfellow et al Main Ideas: You have two neural nets - a generative and a discriminative one…





ML Paper Notes: CNN Features off-the-shelf: an Astounding Baseline for Recognition

Title: CNN Features off-the-shelf: an Astounding Baseline for Recognition (2014) Razavian et al. Main Ideas: The paper shows that just by…





ML Paper Notes: Attention is All You Need

Title: Attention is All You Need (2017) Vaswani et al. Main Ideas: General Experiment Setup: First experiment trained on WMT 2014 English…





ML Paper Notes: Unsupervised Domain Adaptation by Backpropagation

Title: Unsupervised Domain Adaptation by Backpropagation (2015), Ganin et al. Main Ideas: If we want to train a classifier, we would usually…





ML Paper Notes: Distilling the Knowledge in a Neural Network

Title: Distilling the Knowledge in a Neural Network (2015), G. Hinton et al. Main Ideas: This classic paper by Hinton et al. describes a…





Statistics: P-value Explained

Informal Definition: The probability of getting the same distribution if the null hypothesis were true. Example: you have a web page, and…





DeepDream Explained Clearly

DeepDream is one of the coolest applications of machine learning - it started out at Google as an effort to gain more insights into the…





Adding Images to Gatsby Blog in Markdown

Ok I spent the past 3 hours trying to find out this annoying little thing. Spoiler alert: You DON'T NEED any Gatsby plugins. Stop reading…





Making Sense of AI on Blockchain: Part 2

In the first article of this series, I introduced the lifecycle of AI (which consists of training and inference) and discussed blockchain…





Making Sense of AI on Blockchain: Part 1

Any projects that involve both AI and Blockchain are bound to raise some eyebrows — after all, putting together two hottest buzzwords of the…





Binary Tree Level Order Traversal Python Solution

This problem can be better solved iteratively rather than recusively. The essential idea is that you can get all the next level nodes from…





Maximum Subarray - Kadane's Algorithm Explained

I was inspired to write this post after completing the Leetcode challenge and coming across the Kadane algorithm. It's a very interesting…





The Case for Investing in Cryptocurrencies: A More Rational Analysis

Two great investors, Charlie Munger and Warren Buffet have completely dismissed the value of investing in cryptocurrencies. “I think the…





How to Run a Python Script in Your Node Backend

While javascript is awesome, you're not obligated to use it throughout your entire stack. There will come a time where it is more…





Add Syntax Highlighting to Your Gatsby Blog in 2 Minutes

This is a quick tutorial on how to add syntax highlighting for your Gatsby blog in 4 steps. First, run Second, go to your file, find the…





How to Use Google Translate in Your Node App in 3 Steps

The official documentation on this leaves quite a few gaps and took a while to figure out so I'm sharing a quick tutorial here. Note: If you…





How to Use Materialize with React: The Easier Way

Before npm installing any package, or use Material-UI or react-materialize, let me tell you that there is a more straightforward way of…





How to Add External Javascript Script Tag in Gatsby

This is a tricky topic and there are few good resources out there that teach how to do it. After a few hours of flapping around I finally…





How to Fix Detached Head in Git

You did something and ended up in the detach head branch. You don’t know how you ended up here but you did. But you want to keep everything…





Git Cheatsheet: Commands That Actually Work

This is a constantly updating notebook on git commands that actually work for me. First, some vocabulary to help you read the docs and my…