NNCP v2: Lossless Data Compression with Transformer

Update to NNCP:

  • learned relative positionings (T5-type?) instead of sinusoidal absolute positioning
  • GELU instead of ReLU
  • Weight initialization scaled down in higher layers
  • periodic retraining on decompressed data