About

I am a Software Engineer specializing in AI infrastructure with 10+ years of industry experience spanning large-scale distributed systems at big tech (AWS, Uber), full-stack MLOps platforms, and hands-on startup experience building ML infrastructure from the ground up.

Current Role

As Tech Lead at an AI startup, I build and optimize the entire LLM infrastructure stack. On the training side, this includes distributed training with model parallelism, faster dataloading (RDMA), and custom Triton kernel fusion. On the inference side, I work on quantization, tensor parallelism, KV cache optimization, and continuous batch scheduling. I also build scalable RAG applications and develop tools for LLM performance profiling and capacity planning.

Previous Experience

At AWS, I led the development of Amazon Transcribe — a real-time speech recognition service — from scratch, building multi-tenant cloud services, real-time bidirectional streaming (HTTP/2, gRPC), and full MLOps training and evaluation pipelines. I led a team to develop applications for specialized domains including medical and call center analytics. I hold a patent on streaming real-time automatic speech recognition (US10777186B1).

At Uber, I engineered a Kubernetes-based control plane for the internal ML platform, designed workflow systems to streamline model lifecycles (training, evaluation, monitoring), and built orchestration frameworks to improve reliability of training pipelines on Spark and Ray.

Background

I started my career in CPU physical design at Broadcom, working on ARM v16 cores — timing closure, place & route, and CAD tooling. This hardware-level understanding of compute architecture informs my systems work today.

I hold a Master’s in Electrical & Computer Engineering from the University of Florida and a Bachelor’s in Micro-electronics from UESTC.