1 post on this topic, newest first.
I built an 11.5-million-parameter LLM from scratch and instrumented it so you can watch it learn to tell stories - every word, every probability, every training step.