Xintong Li

I am Xintong Li, a Ph.D. candidate at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by professor Mingyu Gao. I received my Bachelor’s degree in Computer Science from Peking University, where I was advised by professor Guangyu Sun.

My primary research interest lies in hardware-software co-optimization for irregular computation patterns, particularly those involving sparsity and dynamism. My work encompasses accelerating sparse computation kernels (like SpMSpM), designing solutions for Near-Data Processing (NDP) / Processing-in-Memory (PIM) hardware, and optimizing Large Language Model (LLM) inference services, with a special focus on leveraging sparsity and addressing complex load-balancing challenges.

Research Highlights

My main projects focus on building efficient accelerator and memory systems for sparse computations.

[ISCA 2025] HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches
- We define a complete design space for sparse tiling and analyze the impact of various sparsity features. Our method uses a static offline scheduler to find a near-optimal initial tiling plan and employs a dynamic runtime to efficiently manage metadata and adjust tile shapes, boosting cache utilization. HYTE achieves a 3.9–7.4x performance improvement over state-of-the-art sparse accelerators.
[MICRO 2025] SeaCache: Efficient and Adaptive Caching for Sparse Accelerators
- To handle the variable access patterns in sparse workloads, we propose a two-level address mapping scheme to reduce cache line waste. SeaCache also features a low-cost prefetcher that uses one matrix’s access patterns to predict another’s and explores optimal cache allocation between data, metadata, and prefetched content.

Experience

ByteDance, AML Heterogeneous Computing Group (Apr 2025 – Present)
- Research Intern. My work involves deploying LLM inference services on novel hardware like AiMx (an in-memory computing chip) and evaluating the performance of emerging technologies (PIM, RRAM) for LLM applications. I also contributed to the design and optimization of AutoNDP, a system for automatic mapping and scheduling of LLM inference on near-data processing platforms.