AI Compute - Scaling Out
This blog post covers the AI compute scaling out network architectures.
This blog post covers the AI compute scaling out network architectures.
This blog post covers the LLM inference context parallel.
This blog post covers the from NCCL to DTensor: The Anatomy of PyTorch Distributed.
This blog post covers the numerics in deep learning.
This blog post covers the quantization in deep learning.
This blog post covers the math behind deep learning - part 1.
A deep dive into the transformer decoder architecture used in LLMs — covering math equations, FLOP calculations, and weight/activation memory analysis for each layer.