I am a Software Engineer specializing in AI infrastructure with 10+ years of industry experience spanning large-scale distributed systems at AWS, full-stack MLOps platforms at Uber, and hands-on startup experience building ML infrastructure from the ground up.
Current Role
As Tech Lead at an AI research startup, I built an unified ML platform from scratch. This platform is the foundation of the company’s AI research. Every day ML research is done on this platform, ranging from notebook, distributed training, large batch inference, real-time inference, and agentic workflows, etc.
In addition, I optimize the entire ML infrastructure stack. On the training side, this includes distributed training with model parallelism, faster dataloading (RDMA), and custom kernel fusion. On the inference side, I work on quantization, context parallelism, KV cache optimization, and continuous batch scheduling. I also build AI agent to automate daily engineering and operation tasks.
Previous Experience
At AWS, I led the development of Amazon Transcribe — a real-time speech recognition service — from scratch, building multi-tenant cloud services, real-time bidirectional streaming (HTTP/2, gRPC), and full MLOps training and evaluation pipelines. I led a team to develop applications for specialized domains including medical and call center analytics.
At Uber (Michelangelo team), I engineered a Kubernetes-based control plane for the internal ML platform, designed workflow systems to streamline model lifecycles (training, evaluation, monitoring), and built orchestration frameworks to improve reliability of training pipelines on Spark and Ray.
Background
I started my career in CPU physical design at Broadcom, working on ARM v16 cores — timing closure, place & route, and CAD tooling. This hardware-level understanding of compute architecture informs my systems work today.
I hold a Master’s in Electrical & Computer Engineering from the University of Florida and a Bachelor’s in Micro-electronics from UESTC.