MSCS Thesis Presentation - Shiqi Pan

— 4:30pm

Location:
In Person - Reddy Conference Room, Gates Hillman 4405

Speaker:
SHIQI PAN , Master's Student, Computer Science Department, Carnegie Mellon University

Architectural Disaggregation for LLM Serving on Heterogeneous Device: An Analytical Framework and Serving System

Modern large language models contain operations with vastly different computational characteristics: projections and MLPs are compute-bound, while attention mechanisms are memory-bound. Hybrid architectures combining sliding window attention, linear attention, and Mixture of Experts further complicate this operational heterogeneity. Meanwhile, datacenters deploy heterogeneous GPUs with complementary profiles—H100s excel at compute-intensive workloads while H20s better serve memory-bound operations. This creates opportunities for operation-level disaggregation: matching different operations to specialized hardware.

However, two critical gaps prevent realizing these opportunities. First, no framework systematically characterizes how hybrid LLM operations perform on heterogeneous hardware. Second, current serving systems use rigid layer-granularity pipeline parallelism, preventing specialized placement of individual operations.

This thesis addresses both gaps. We develop quantitative performance models characterizing operation-level costs, arithmetic intensity, and bottlenecks for attention variants, MLP, and MoE operations, demonstrating the motivation for architectural disaggregated placement. Additionally, we design and implement a flexible system extending vLLM that supports arbitrary operation-level stage definitions and non-contiguous patterns through multi-visit execution, metadata caching, zero-copy tensor transmission, and tensor reordering for FlashAttention compatibility.

This work provides the analytical foundation and system infrastructure for operation-aware heterogeneous LLM serving, enabling future research in automated configuration and deployment optimization.

Thesis Committee
Rashmi K. Vinayak (Chair)
Zhihao Jia

Additional Information 

For More Information:
amalloy@cs.cmu.edu


Add event to Google
Add event to iCal