Blogs

What Makes a Good MLOps Platform for an AI Research Lab?

This blog post covers the MLOps platform design principles for research lab.

AI Compute - Scaling Out

This blog post covers the AI compute scaling out network architectures.

LLM Inference Context Parallel

This blog post covers the LLM inference context parallel.

From NCCL to DTensor: The Anatomy of PyTorch Distributed

This blog post covers the from NCCL to DTensor: The Anatomy of PyTorch Distributed.

Numerics

This blog post covers the numerics in deep learning.

Quantization

This blog post covers the quantization in deep learning.

Math for Deep Learning - Part 1

This blog post covers the math behind deep learning - part 1.

Transformer Decoder Architecture

A deep dive into the transformer decoder architecture used in LLMs — covering math equations, FLOP calculations, and weight/activation memory analysis for each layer.