Web Resources // Lexicon

An Overview of Early Vision in InceptionV1

Wonderful distill article inspecting and categorizing feature detectors in the first 5 layers of a trained ConvNet. I wonder whether many of these convolutions could be refactored into one or more simple blurs, followed by sparse multiplications. It would be nice to have efficient, standardized early vision burned into ASICs. It’s interesting how early certain important features appear, like faces, fur, and scales.

Diffusion Models as a kind of VAE

Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch

Handwriting Generation Demo In Tensorflow

Provides a good expanded explanation of the handwriting generation part of Alex Graves’ RNN paper.
Both are multilayer LSTM models with Mixture Density Network output.
Otoro’s version uses 2 layers of 250 LSTM cells, trained on IAM online handwriting dataset.

Four Experiments in Handwriting with a Neural Network

A set of neat visualizations:

plot a connected sequence of pen tip locations
plot a connected sequence of generated continuations
plot a connected sequence of randomly generated locations, and superimpose at each point other potential continuations drawn from the gaussian mixture.
plot the sequence of generated locations, along with the current activation of one lstm cell.
update and allow a choice of cell.
plot the sequence of activations for each cell, in order.
sort cells according to tSNE, and display their activations in order.
plot the activations of all cells at a selected point in the sequence.

Normalizing Flows in 100 Lines of JAX

Normalizing flows/NVT operations explained succinctly

Just Ask for Generalization

Neat discusion of D-REX, which optimizes a flawed policy based on random perturbations to learn a gradient to optimize toward.

To Understand Language is to Understand Generalization

Further elaboration on Just Ask for Generalization, discusses different types of linguistic regularities.

Generative Modeling by Estimating Gradients of the Data Distribution

An excellent explanation of what langevin dynamics means in the context of diffusion generative models

Jakub M. Tomczak’s Blog

A series of 14 posts about generative modeling.

Attention? Attention!

Overview of attention mechanisms leading to transformer architecture.

The Transformer Family

A great overview of basic Transformer variants with illustrations snipped from their papers.

What are Diffusion Models?

A neat overview of diffusion models, VAEs and Flow models.

Training Networks in Random Subspaces

You can determine how difficult a problem is by choosing an alternative, shrunken coordinate system to perform the optimization of parameters in.

Thoughts:

this is a lot like Relative Representations! Using a random or meaningful subspace as a coordinate system for training in this case is similar to using it for model stitching or one-shot inference in RelReps.

Further elaborated at https://www.uber.com/blog/intrinsic-dimension/

We construct the random subspace by sampling a set of random directions from the initial point; these random directions are then frozen for the duration of training. Optimization proceeds directly in the coordinate system of the subspace.

Later published in the Intrinsic Dimension paper

James Leopore

Web Resources

An Overview of Early Vision in InceptionV1

Diffusion Models as a kind of VAE

Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch

Handwriting Generation Demo In Tensorflow

Four Experiments in Handwriting with a Neural Network

Normalizing Flows in 100 Lines of JAX

Just Ask for Generalization

To Understand Language is to Understand Generalization

Generative Modeling by Estimating Gradients of the Data Distribution

Jakub M. Tomczak’s Blog

Attention? Attention!

The Transformer Family

What are Diffusion Models?

Training Networks in Random Subspaces

Scribe: Realistic Handwriting with Tensorflow

How to Use t-SNE Effectively

A survey of cross-lingual word embedding models

Autoencoding a Single Bit

Gaussian Mixture VAE: Lessons in Variational Inference, Generative Models, and Deep Nets

Dilated Convolutions and Kronecker Factored Convolutions

Visualizing machine learning one concept at a time

MinImagen - Build Your Own Imagen Text-to-Image Model

Multi-head Attention, deep dive

MS-CoCo Dataset Explorer