Google Scholar
Semantic Scholar
I believe model development should be delightful to both researchers and performance engineers, that it should be easy to write single-device code and have it reliably scale to tens of thousands of chips with predictable capabilities and hardware-saturating performances.
Themes I'm interested in:
Training framework design: how to balance research velocity, numerical stability, performance engineering ergonomics, and fast-changing hardware platform characteristics at exascale.
Machine learning compilation in general, with a focus on SPMD partitioning strategies and the XLA ecosystem.
Pipelining at every scale: kernel-level optimization on the memory hierarchy, operator-level communication/computation overlaps in ultra-sparse mixture of experts, stage-level pipeline parallelism, and various types of CPU/GPU co-optimizations.
Running models on mobile and edge devices: inference, retrofitting OS (Linux, AOSP, RTOS), desgining interactive primitives around them.
Automation and synthetic content: how they reshape the economics of digital attention, recommendation, search, and creative markets.
Agents as persistent, collaborative entities, from multi-agent coordination and curiosity-driven exploration to co-presence as an interaction primitive.
Services
- Artifact Evaluation: MLSys, CGO
- Journal Review: IEEE TPAMI, IEEE TKDE
- Conference Review: NeurIPS, LoG, AISTATS, ICML, ICLR, ACL
A small caveat
- I publish under both Liao Peiyuan and Peiyuan Liao, so I may appear as both (Peiyuan, 2026) and (Liao, 2026).