Frontier Research Lab

Future Machine Learning
& Systems Lab

We study the algorithms and systems that make large-scale machine learning more efficient — spanning low-bit quantization, KV-cache compression, attention architectures, and hardware-aware ML systems.

About

Who we are

Welcome to the Future Machine Learning & Systems (FutureMLS) Lab. We work at the intersection of machine learning and computer systems. Today's foundation models are remarkably capable but costly to train and serve. Our mission is to close the gap between rapidly growing model capability and the real-world cost of deploying these models.

We pursue algorithm–system co-design: quantization and compression methods that preserve accuracy, attention mechanisms that shrink memory footprints, and runtimes that make deployment practical. Our research spans the AI stack — from model compression and KV-cache design to GPU kernels and serving systems — and our work is open-source, reproducible, and built to be used.

Founder & PI: Zhongzhu Zhou Advisors: Xiaoxia Wu, Shuaiwen Leon Song

Founder

About Zhongzhu Zhou

Zhongzhu Zhou (Charlie Zhou) is the founder and principal investigator of the Future Machine Learning & Systems Lab. He is a Senior Research Scientist on the Turbo Team at Together AI, and earned his Ph.D. at the School of Computer Science, University of Sydney, advised by Prof. Shuaiwen Leon Song.

His research spans efficient machine learning and systems — from pretraining quality to efficient algorithms and algorithm–system co-design that bridges emerging ML/LLM methods and real-world applications, improving both productivity (usable, robust stacks) and performance (throughput, memory, and cost-efficiency). He received his B.Eng. (Hons) from Sun Yat-sen University, and has interned at Dolby, the DeepSpeed team at Microsoft, and Tencent.

He leads projects across the lab's four themes, including OSCAR (2-bit KV-cache quantization), CARE (covariance-aware Multi-Head Latent Attention), and RUD (an automatic run–evaluate–refine loop).

Research

What we work on

Four directions, one goal: efficient and capable AI at scale.

01

Efficient ML Algorithm

Algorithmic methods that cut compute and memory while preserving accuracy — sparsity, low-rank structure, and efficient training & inference for large models.

02

Efficient ML System

Systems that make efficient algorithms practical end-to-end — GPU kernels, distributed runtimes, KV-cache design, and high-throughput serving.

03

Quantization

Low-bit weight, activation, and KV-cache quantization — spectral and covariance-aware methods that keep accuracy down to 2-bit.

04

Modeling

Model and architecture design — attention mechanisms, latent representations, and architectures that trade memory for capability gracefully.

News

Recent updates

Team

People

A small group of researchers and advisors building in the open.

Advisors

Xiaoxia Wu

Xiaoxia Wu

Advisor

Principal Research scientist in efficient ML and low-bit quantization, with extensive work across the DeepSpeed and Together AI.

  • Efficient ML Algorithm
  • Efficient ML System
  • Quantization
Shuaiwen Leon Song

Shuaiwen Leon Song

Advisor

Professor at the University of Sydney; high-performance computing and ML systems.

  • Efficient ML System

Members

Fengxiang “Bobbie” Bie

Fengxiang “Bobbie” Bie

Student Researcher

Efficient ML algorithms and speculative decoding; contributor to CARE.

  • Efficient ML Algorithm
  • Speculator
Ziyan Chen

Ziyan Chen

Student Researcher

Efficient ML algorithms and KV-cache compression; contributor to OSCAR and CARE.

  • Efficient ML Algorithm

Ryan Wang

Student Researcher

Efficient ML algorithms and speculative decoding for large-scale machine learning.

  • Efficient ML Algorithm
  • Speculator
Projects

Open-source research

Selected projects from the lab — open the preview page for details, papers, and code.

Contact

Get in touch

We welcome collaborators, prospective students, and contributors who care about efficient, open machine learning and systems.

Email the lab View our GitHub ↗