Xintong Li

Ph.D. Candidate at IIIS, Tsinghua University. | lixt21@mails.tsinghua.edu.cn

I am Xintong Li, a Ph.D. candidate at the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by professor Mingyu Gao. I received my Bachelor’s degree in Computer Science from Peking University, where I was advised by professor Guangyu Sun.

My primary research interest lies in hardware-software co-optimization for irregular computation patterns, particularly those involving sparsity and dynamism. My work encompasses accelerating sparse computation kernels (like SpMSpM), designing solutions for Near-Data Processing (NDP) / Processing-in-Memory (PIM) hardware, and optimizing Large Language Model (LLM) inference services, with a special focus on leveraging sparsity and addressing complex load-balancing challenges.

Research Highlights

My main projects focus on building efficient accelerator and memory systems for sparse computations.

  • [ISCA 2025] HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic Approaches
    • We define a complete design space for sparse tiling and analyze the impact of various sparsity features. Our method uses a static offline scheduler to find a near-optimal initial tiling plan and employs a dynamic runtime to efficiently manage metadata and adjust tile shapes, boosting cache utilization. HYTE achieves a 3.9–7.4x performance improvement over state-of-the-art sparse accelerators.
  • [MICRO 2025] SeaCache: Efficient and Adaptive Caching for Sparse Accelerators
    • To handle the variable access patterns in sparse workloads, we propose a two-level address mapping scheme to reduce cache line waste. SeaCache also features a low-cost prefetcher that uses one matrix’s access patterns to predict another’s and explores optimal cache allocation between data, metadata, and prefetched content.

Experience

  • ByteDance, AML Heterogeneous Computing Group (Apr 2025 – Present)
    • Research Intern. My work involves deploying LLM inference services on novel hardware like AiMx (an in-memory computing chip) and evaluating the performance of emerging technologies (PIM, RRAM) for LLM applications. I also contributed to the design and optimization of AutoNDP, a system for automatic mapping and scheduling of LLM inference on near-data processing platforms.