19 posts tagged with "Machinelearning"

How Exactly Does Masking in Transformer Work

Masking is one of those concepts that is easy to wave your hands at but quite important if you want to implement the Transformer from…

BERT: A Detailed Guide to Clear Up All Your Confusions

Pre-requisites Understanding of Transformer, especially how masking works. (I strongly recommend Jay Alammar's article The Illustrated…

What does getitems mean? Functions with Double Underscores in Python

While reading the PyTorch source code, I started to notice that many classes have functions named __xxx__ such as __init__ , __len__, or…

Byte-pair Encoding Algorithm

BPE (byte-pair encoding) is a good way (alternative to schemes such as one-hot encoding and Word2Vec pre-trained embeddings) to encode words…

Masking in the Transformer Explained

The Transformer is a landmark breakthrough in NLP that is explained quite well by Jay Alammar's article The Illustrated Transformer. This…

PyTorch scatter_ Function Explained

In PyTorch, is a function you can use to write the values in tensor into the tensor. The best way I found to think about this function is…

Label Smoothing Explained

Label smoothing is a very straightforward regularization technique which is explained extremely well on this page. The basic idea is that…

ML Paper Notes: Progressive Neural Networks

Title: Progressive Neural Networks (2016) Rusu et al. Main Ideas: The novel progressive network proposed in the paper is a more…

ML Paper Notes: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Title: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks (2016) Bengio et al. Main Ideas: For language generation…

ML Paper Notes: Matching Networks for One-Shot Learning

Title: Matching Networks for One-Shot Learning (2017), Vinyals et al. Main Ideas: One-shot learning means learning to classify a class from…

ML Paper Notes: Generative Adversarial Nets

Title: Generative Adversarial Nets (2014), Goodfellow et al Main Ideas: You have two neural nets - a generative and a discriminative one…

ML Paper Notes: CNN Features off-the-shelf: an Astounding Baseline for Recognition

Title: CNN Features off-the-shelf: an Astounding Baseline for Recognition (2014) Razavian et al. Main Ideas: The paper shows that just by…

ML Paper Notes: Attention is All You Need

Title: Attention is All You Need (2017) Vaswani et al. Main Ideas: General Experiment Setup: First experiment trained on WMT 2014 English…

ML Paper Notes: Unsupervised Domain Adaptation by Backpropagation

Title: Unsupervised Domain Adaptation by Backpropagation (2015), Ganin et al. Main Ideas: If we want to train a classifier, we would usually…

ML Paper Notes: Distilling the Knowledge in a Neural Network

Title: Distilling the Knowledge in a Neural Network (2015), G. Hinton et al. Main Ideas: This classic paper by Hinton et al. describes a…

Statistics: P-value Explained

Informal Definition: The probability of getting the same distribution if the null hypothesis were true. Example: you have a web page, and…

DeepDream Explained Clearly

DeepDream is one of the coolest applications of machine learning - it started out at Google as an effort to gain more insights into the…

Making Sense of AI on Blockchain: Part 2

In the first article of this series, I introduced the lifecycle of AI (which consists of training and inference) and discussed blockchain…

Making Sense of AI on Blockchain: Part 1

Any projects that involve both AI and Blockchain are bound to raise some eyebrows — after all, putting together two hottest buzzwords of the…