NNCP v2
NNCP v2: Lossless Data Compression with Transformer
Update to NNCP:
- learned relative positionings (T5-type?) instead of sinusoidal absolute positioning
- GELU instead of ReLU
- Weight initialization scaled down in higher layers
- periodic retraining on decompressed data