Future Open Source Research

Future Machine Learning
& Systems Lab

We study how to make machine learning more capable, efficient, and practical at scale through four connected directions: algorithm and architecture innovation, systems optimization, quantization, and training-driven modeling.

Research Team GitHub ↗

Efficient ML Algorithm
Efficient ML System
Quantization
Modeling

About

Who we are

Welcome to the Future Machine Learning & Systems (FutureMLS) Lab. We work at the intersection of machine learning and computer systems. Today's foundation models are remarkably capable but costly to train and serve. Our mission is to close the gap between rapidly growing model capability and the real-world cost of deploying these models.

We pursue algorithm-system co-design across four connected themes: Efficient ML Algorithm for algorithm and architecture innovation, Efficient ML System for systems optimization, Quantization as a core research focus, and Modeling for improving models through training. Our work spans the AI stack, from methods and model design to kernels, runtimes, and serving systems, and is open-source, reproducible, and built to be used.

Founder & PI: Zhongzhu Zhou

Founder

About Zhongzhu Zhou

Zhongzhu Zhou (Charlie Zhou) is the founder and principal investigator of the Future Machine Learning & Systems Lab. He is a Senior Research Scientist on the Turbo Team at Together AI, and earned his Ph.D. at the School of Computer Science, University of Sydney.

His research spans efficient machine learning and systems — from pretraining quality to efficient algorithms and algorithm–system co-design that bridges emerging ML/LLM methods and real-world applications, improving both productivity (usable, robust stacks) and performance (throughput, memory, and cost-efficiency). He received his B.Eng. (Hons) from Sun Yat-sen University, and has interned at Dolby, the DeepSpeed team at Microsoft, and Tencent.

He leads projects across the lab's four themes, including OSCAR (2-bit KV-cache quantization) and CARE (covariance-aware Multi-Head Latent Attention).

Research

What we work on

Four directions, one goal: efficient and capable AI at scale.

Efficient ML Algorithm

Algorithm and architecture innovations that improve capability while reducing compute, memory, and deployment cost.

Efficient ML System

System-level optimizations that make efficient methods practical end-to-end, from kernels and runtimes to high-throughput serving.

Quantization

A core research focus on low-bit weight, activation, and KV-cache quantization that preserves accuracy while cutting memory and compute.

Modeling

Model improvement through training optimization, architecture design, and adaptation methods that make models stronger and easier to use.

News

Recent updates

May 26, 2026
OSCAR crosses 300 ★ on GitHub in its first week — thanks to the open-source community.
May 19, 2026
OSCAR released — 2-bit KV-cache serving at 2.28 effective bits/element with near-BF16 accuracy on Qwen3 and GLM-4.7.
Apr 2026
CARE presented at ICLR 2026: covariance-aware, rank-enhanced decomposition for Multi-Head Latent Attention.

View all news →

Team

People

A small group of researchers and advisors building in the open.

Advisors

Xiaoxia Wu

Advisor

Principal Research scientist in efficient ML and low-bit quantization, with extensive work across the DeepSpeed and Together AI.

Efficient ML Algorithm
Efficient ML System
Quantization

Scholar Homepage

Shuaiwen Leon Song

Advisor

Professor at the University of Sydney; high-performance computing and ML systems.

Efficient ML System

Scholar Homepage

Members

Fengxiang “Bobbie” Bie

Student Researcher

Efficient ML algorithms and speculative decoding; contributor to CARE.

Efficient ML Algorithm
Speculator

GitHub Scholar

Ziyan Chen

Student Researcher

Efficient ML algorithms and KV-cache compression; contributor to OSCAR and CARE.

Efficient ML Algorithm

GitHub Scholar

Ryan Wang

Student Researcher

Efficient ML algorithms and speculative decoding for large-scale machine learning.

Efficient ML Algorithm
Speculator

GitHub Scholar

Zhizhou Sha

Student Researcher

Quantization-aware training for efficient and accurate large language models.

Quantization
Quantization-Aware Training

GitHub Scholar Homepage

Projects

Open-source research

Selected projects from the lab — open the preview page for details, papers, and code.

OSCAR

Quantization

2-bit KV-cache quantization · 2026

Attention-aware offline rotations compress the KV cache to 2.28 bits/element — ~8× memory reduction and up to ~7× higher throughput with near-BF16 accuracy.

View project →

CARE

Efficient ML Algorithm

Multi-Head Latent Attention · ICLR 2026

Covariance-aware low-rank decomposition that converts pretrained GQA/MHA into MLA — up to 215× lower one-shot perplexity at matched KV budgets.

View project →

Contact

Get in touch

We welcome collaborators, prospective students, and contributors who care about efficient, open machine learning and systems.

Email the lab View our GitHub ↗

Future Machine Learning& Systems Lab

Who we are

About Zhongzhu Zhou

What we work on

Efficient ML Algorithm

Efficient ML System

Quantization

Modeling

Recent updates

People

Advisors

Xiaoxia Wu

Shuaiwen Leon Song

Members

Fengxiang “Bobbie” Bie

Ziyan Chen

Ryan Wang

Zhizhou Sha

Open-source research

OSCAR

CARE

Get in touch

Future Machine Learning
& Systems Lab