Rachit Singh
/
Recent content on Rachit SinghHugo -- gohugo.ioen-usFri, 26 Mar 2021 01:48:15 -0400Deep learning model compression
/deep-learning-model-compression/
Fri, 26 Mar 2021 01:48:15 -0400/deep-learning-model-compression/Quantization Pruning DeepSpeed & ZeRO-Offload Knowledge distillation This post covers model inference optimization or compression in breadth and hopefully depth as of March 2021. This includes engineering topics like model quantization and binarization, more research-oriented topics like knowledge distillation, as well as well-known-hacks.
If anyone notices anything incorrect, please let me know. Feel free to contact me at my email (my full name @ outlook.com). Each year, larger and larger models are able to find methods for extracting signal from the noise in machine learning.The Langevin Equation
/the-langevin-equation/
Sun, 22 Apr 2018 18:57:21 -0400/the-langevin-equation/This post covers the Langevin equation, a stochastic differential equation that models the dynamics of particles in Brownian motion1. This covers the ideas used in this reference due to Lennart Sjögren.
Langevin Equation In 1907 Einstein published a paper that derived a macroscopic quantity $D$, the diffusion constant, with microscopic quantities:
$$D = \frac{k_BT}{6\pi\eta a}$$
where $\eta$ is the viscosity of the liquid and $a$ is the radius of the particle.Persistence Length
/persistence-length/
Sun, 22 Apr 2018 10:52:54 -0400/persistence-length/td { padding: 5px; font-family: monospace; font-size: 1.25rem; } th { text-align: center; padding: 0px 5px; } th.left_column { text-align: right; } figure { margin: 0px 20px; max-width: 50rem; } img[src*="#smaller"] { width: 65%; margin: auto; margin-bottom: 15px; } In class we recently discussed the simplified elastic rod model for polymers, which assumes that polymers can be modeled as an inextensible rod, i.e. that the length of the rod doesn't change, and that the twist of the polymer is ignorable (possibly because the polymer is joined by single bonds).A few favorite papers of 2017
/a-few-favorite-papers-of-2017/
Tue, 09 Jan 2018 12:48:10 -0800/a-few-favorite-papers-of-2017/This isn't an exhaustive list, and I will inevitably forget some papers. I'll keep updating as a remember, and will probably expand some of the background/contribution sections as I have time, so that they're more accessible.
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model [link] Background: Language models and NLP tasks almost always use a softmax to compute a distribution over the vocabulary, and usually this is computed as \(\sigma(\mathbf{W}\mathbf{h})\), where \(\mathbf{h}\) is a \(d\)-dimensional context vector from a previous layer, and \(\mathbf{W} \in \mathbb{R}^{M \times d}\) is a word embedding, letting $M$ be the vocabulary size.PyTorch Internals, cuRAND, and numerical instability
/pytorch-internals-curand-and-numerical-instability/
Wed, 03 Jan 2018 18:44:27 -0800/pytorch-internals-curand-and-numerical-instability/Random sampling I've been working lately to implement random samplers from a number of distributions in PyTorch, both on CPU and CUDA. This is a topic near and dear to my heart, since it has caused me a lot of trouble multiple times. Once this PR is merged, I'll post an explanation/notebook of why this is important.
Here's a brief summary of the motivation:
We want to sample from distributions like \(\operatorname{Beta}(a, b)\).ELBO Surgery
/elbo_surgery/
Sat, 23 Dec 2017 12:08:00 -0800/elbo_surgery/td { padding: 5px; font-family: monospace; font-size: 1.25rem; } th { text-align: center; padding: 0px 5px; } th.left_column { text-align: right; } figure { margin: 0px 20px; max-width: 50rem; } img[src*="#smaller"] { width: 50%; margin: auto; } tldr: The ubiquitous isotropic Gaussian prior for generative models doesn't make sense / doesn't work, which motivates work on priors.
At NIPS, Dawen Liang mentioned Hoffman & Johnson's ELBO surgery paper offhand while talking about tuning KL divergences, and it's very interesting, so I thought I'd go over it.Links
/links/
Thu, 14 Dec 2017 16:55:26 -0500/links/Here are some useful links I’ve found:
LaTeX A tikz-cd graphical editor that I wish I’d during 55… - http://tikzcd.yichuanshen.de/ For high power Bayesian diagrams, I like tikz-bayesnet, but honestly it’s often not worth the trouble vs. using tikz-cd and adding a circle macro. ShareLaTeX is quite useful for collaborating, and open source. One day when I have time I’ll make a PR… Vim My dotfiles are here: http://github.NIPS 2017
/nips/
Sat, 09 Dec 2017 00:47:37 -0800/nips/I'm starting this blog to share research ideas that I have, and some solutions to problems I find along the way. I've been helped immensely by other people's blogs in the past, and want to do the same. Also it'll give me a chance to communicate the way I approach problems, and hopefully people will give me alternative perspectives either by email (rachitsingh@outlook.com) or in the comments, once I figure out how that works.About
/about/
Thu, 07 Dec 2017 18:14:16 -0800/about/I’m Rachit Singh. While at Harvard, I did research in variational inference, Indian Buffet processes, and language models. I worked with Alexander Rush, Finale Doshi-Velez, and was part of the Harvard NLP research group. I frequently worked with Jeffrey Ling.
These days my interests lie in a variety of fields but still include language modeling and NLP.