An Overview of Early Vision in InceptionV1

Wonderful distill article inspecting and categorizing feature detectors in the first 5 layers of a trained ConvNet. I wonder whether many of these convolutions could be refactored into one or more simple blurs, followed by sparse multiplications. It would be nice to have efficient, standardized early vision burned into ASICs. It’s interesting how early certain important features appear, like faces, fur, and scales.

Diffusion Models as a kind of VAE

Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch

Handwriting Generation Demo In Tensorflow

Provides a good expanded explanation of the handwriting generation part of Alex Graves’ RNN paper.
Both are multilayer LSTM models with Mixture Density Network output.
Otoro’s version uses 2 layers of 250 LSTM cells, trained on IAM online handwriting dataset.

Four Experiments in Handwriting with a Neural Network

A set of neat visualizations:

  • plot a connected sequence of pen tip locations
  • plot a connected sequence of generated continuations
  • plot a connected sequence of randomly generated locations, and superimpose at each point other potential continuations drawn from the gaussian mixture.
  • plot the sequence of generated locations, along with the current activation of one lstm cell.
  • update and allow a choice of cell.
  • plot the sequence of activations for each cell, in order.
  • sort cells according to tSNE, and display their activations in order.
  • plot the activations of all cells at a selected point in the sequence.

Normalizing Flows in 100 Lines of JAX

Normalizing flows/NVT operations explained succinctly

Just Ask for Generalization

Neat discusion of D-REX, which optimizes a flawed policy based on random perturbations to learn a gradient to optimize toward.

To Understand Language is to Understand Generalization

Further elaboration on Just Ask for Generalization, discusses different types of linguistic regularities.

Generative Modeling by Estimating Gradients of the Data Distribution

An excellent explanation of what langevin dynamics means in the context of diffusion generative models

Jakub M. Tomczak’s Blog

A series of 14 posts about generative modeling.

Attention? Attention!

Overview of attention mechanisms leading to transformer architecture.

The Transformer Family

A great overview of basic Transformer variants with illustrations snipped from their papers.

What are Diffusion Models?

A neat overview of diffusion models, VAEs and Flow models.

Training Networks in Random Subspaces

You can determine how difficult a problem is by choosing an alternative, shrunken coordinate system to perform the optimization of parameters in.

Thoughts:

  • this is a lot like Relative Representations! Using a random or meaningful subspace as a coordinate system for training in this case is similar to using it for model stitching or one-shot inference in RelReps.

Further elaborated at https://www.uber.com/blog/intrinsic-dimension/

We construct the random subspace by sampling a set of random directions from the initial point; these random directions are then frozen for the duration of training. Optimization proceeds directly in the coordinate system of the subspace.

Later published in the Intrinsic Dimension paper

Scribe: Realistic Handwriting with Tensorflow

Another nicely illustrated reimplementation of Alex Graves’ RNN HW generation, focused on the attention mechanism.

How to Use t-SNE Effectively

I’m mostly interested in this as a good demo of data plotting using js and D3.
their js assets are here on github

A survey of cross-lingual word embedding models

A great overview (as of 2016..) on a thickly covered topic

Autoencoding a Single Bit

Gaussian Mixture VAE: Lessons in Variational Inference, Generative Models, and Deep Nets

with more at https://github.com/ruishu/vae-clustering

Dilated Convolutions and Kronecker Factored Convolutions

Visualizing machine learning one concept at a time

Notably The Illustrated Transformer and Interfaces for Explaining Transformer Language Models

MinImagen - Build Your Own Imagen Text-to-Image Model

Multi-head Attention, deep dive

A set of good diagrams modeling the different matrices and interactions in self-attention heads.

MS-CoCo Dataset Explorer

CoCo annotations are really clumsy polygonal regions, and large models can probably exceed human annotation quality at this point.