Web Resources
An Overview of Early Vision in InceptionV1
Wonderful distill article inspecting and categorizing feature detectors in the first 5 layers of a trained ConvNet. I wonder whether many of these convolutions could be refactored into one or more simple blurs, followed by sparse multiplications. It would be nice to have efficient, standardized early vision burned into ASICs. It’s interesting how early certain important features appear, like faces, fur, and scales.
Diffusion Models as a kind of VAE
Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch
Handwriting Generation Demo In Tensorflow
Provides a good expanded explanation of the handwriting generation part of Alex Graves’ RNN paper.
Both are multilayer LSTM models with Mixture Density Network output.
Otoro’s version uses 2 layers of 250 LSTM cells, trained on IAM online handwriting dataset.
Four Experiments in Handwriting with a Neural Network
A set of neat visualizations:
- plot a connected sequence of pen tip locations
- plot a connected sequence of generated continuations
- plot a connected sequence of randomly generated locations, and superimpose at each point other potential continuations drawn from the gaussian mixture.
- plot the sequence of generated locations, along with the current activation of one lstm cell.
- update and allow a choice of cell.
- plot the sequence of activations for each cell, in order.
- sort cells according to tSNE, and display their activations in order.
- plot the activations of all cells at a selected point in the sequence.
Normalizing Flows in 100 Lines of JAX
Normalizing flows/NVT operations explained succinctly
Just Ask for Generalization
Neat discusion of D-REX, which optimizes a flawed policy based on random perturbations to learn a gradient to optimize toward.
To Understand Language is to Understand Generalization
Further elaboration on Just Ask for Generalization, discusses different types of linguistic regularities.
Generative Modeling by Estimating Gradients of the Data Distribution
An excellent explanation of what langevin dynamics means in the context of diffusion generative models
Jakub M. Tomczak’s Blog
A series of 14 posts about generative modeling.
Attention? Attention!
Overview of attention mechanisms leading to transformer architecture.
The Transformer Family
A great overview of basic Transformer variants with illustrations snipped from their papers.
What are Diffusion Models?
A neat overview of diffusion models, VAEs and Flow models.
Training Networks in Random Subspaces
You can determine how difficult a problem is by choosing an alternative, shrunken coordinate system to perform the optimization of parameters in.
Thoughts:
- this is a lot like Relative Representations! Using a random or meaningful subspace as a coordinate system for training in this case is similar to using it for model stitching or one-shot inference in RelReps.
Further elaborated at https://www.uber.com/blog/intrinsic-dimension/
We construct the random subspace by sampling a set of random directions from the initial point; these random directions are then frozen for the duration of training. Optimization proceeds directly in the coordinate system of the subspace.
Later published in the Intrinsic Dimension paper
Scribe: Realistic Handwriting with Tensorflow
Another nicely illustrated reimplementation of Alex Graves’ RNN HW generation, focused on the attention mechanism.
How to Use t-SNE Effectively
I’m mostly interested in this as a good demo of data plotting using js and D3.
their js assets are here on github
A survey of cross-lingual word embedding models
A great overview (as of 2016..) on a thickly covered topic
Autoencoding a Single Bit
Gaussian Mixture VAE: Lessons in Variational Inference, Generative Models, and Deep Nets
with more at https://github.com/ruishu/vae-clustering
Dilated Convolutions and Kronecker Factored Convolutions
Visualizing machine learning one concept at a time
Notably The Illustrated Transformer and Interfaces for Explaining Transformer Language Models
MinImagen - Build Your Own Imagen Text-to-Image Model
Multi-head Attention, deep dive
A set of good diagrams modeling the different matrices and interactions in self-attention heads.
MS-CoCo Dataset Explorer
CoCo annotations are really clumsy polygonal regions, and large models can probably exceed human annotation quality at this point.