Saturday, June 27

The workshop / tutorial portion of ISCA 2026 begins!

View the the workshops / tutorials program. Please visit the respective webpage of a workshop/tutorial for its detailed schedule.

Sunday, June 28

The workshop / tutorial portion of ISCA 2026 continues!

View the the workshops / tutorials program. Please visit the respective webpage of a workshop/tutorial for its detailed schedule.

Monday, June 29

8:30 EDT
TBD
8:45
Debbie Marr (AheadComputing)
Abstract

For decades, the computing industry advanced along relatively predictable trajectories: Moore's Law, Dennard scaling, increasing abstraction, and the steady expansion of general-purpose computing. But many of the assumptions that shaped computer architecture over the last thirty years are now being challenged simultaneously. This keynote will reflect on several of the key trajectories that brought the industry to this moment, examine the forces driving today's transition, and explore what may define the next era of computing. It will also consider how changing economic realities, ecosystem dynamics, and leadership transitions are influencing innovation across the industry. At a time when long-standing assumptions are being reconsidered across the computing stack, the role of the computer architecture community has never been more important. The next era of computing will be shaped not only by new technologies, but by the architectural choices, research directions, and leadership decisions being made today.


Bio

Debbie Marr is the CEO and Co-Founder of AheadComputing. AheadComputing, founded in July 2024, aims to create breakthrough 64-bit RISC-V application processors. Prior to AheadComputing, Debbie was an Intel Fellow at Intel where she spent 30+ years leading CPU research and development for CPUs from the 386SL to Intel's current leading-edge products. Debbie was the server architect of Intel® Pentium™ Pro, Intel's first Xeon Processor. She brought Intel Hyperthreading Technology from concept to product on the Pentium 4 Processor. She was the chief architect of the 4th Generation Intel Core™ (Haswell) and led advanced research and development for Intel's Core/Xeon CPUs. Debbie also spent 7 years in Intel Labs as the Director of Accelerator Architecture Lab where she led research in machine learning and acceleration techniques for CPU, GPU, FPGA, and AI Accelerators. Debbie published over 30 papers, and over 40 patents in many aspects of CPU, AI accelerators, and FPGA architecture/microarchitecture. Debbie has a PhD in electrical and computer engineering from University of Michigan, an MS in electrical engineering and computer science from Cornell University, and a BS in electrical engineering and computer science from the University of California, Berkeley.

10:00
10:15
Session Chair: TBD
10:15 – 10:35
MLX: Multi-Layer Execution for Structured LLM Workload Acceleration on Spatial Architectures
Haibin Wu (ICT,CAS), Wenming Li (Chinese Academy of Sciences), Zhihua Fan (Institute of Computing Technology, Chinese Academy of Science), Zirui Ma (Institute of Computing Technology, Chinese Academy of Science), Yuqun Liu (Institute of Computing Technology, Chinese Academy of Science), Tengfei Xia (Institute of Computing Technology, Chinese Academy of Sciences), Yanhuan Liu (Institute of Computing Technology, Chinese Academy of Science), Kunming Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Xiaochun Ye (Chinese Academy of Sciences), Dongrui Fan (Institute of Computing Technology, Chinese Academy of Sciences), Jian Weng (KAUST)

10:35 – 10:55
CODO: An Automated Compiler for Comprehensive Dataflow Optimization
Weichuang Zhang (School of Computer Science, Shanghai Jiao Tong University), Yiquan Wang (School of Computer Science, Shanghai Jiao Tong University), Xinzhou Zhang (School of Computer Science, Shanghai Jiao Tong University), Chi Zhang (School of Computer Science, Shanghai Jiao Tong University), Yu Feng (School of Computer Science, Shanghai Jiao Tong University), Xiaofeng Hou (School of Computer Science, Shanghai Jiao Tong University), Chao Li (School of Computer Science, Shanghai Jiao Tong University), Jieru Zhao (School of Computer Science, Shanghai Jiao Tong University), Minyi Guo (School of Computer Science, Shanghai Jiao Tong University)

10:55 – 11:15
COSM: A Cooperative Scheduling Framework for Concurrent PIM and CPU Execution on Mobile Devices
Yilong Zhao (Shanghai Jiao Tong University), Fangxin Liu (Shanghai Jiao Tong University), Onur Mutlu (ETH Zurich), Mingyu Gao (Tsinghua University), Jian Liu (Beijing University of Aeronautics and Astronautics), Haibing Guan (Shanghai Jiao Tong University), Li Jiang (Shanghai Jiao Tong University)

11:15 – 11:35
Cerberus: Cross-Layer ECC Co-Design for Robust and Efficient Memory Protection
Junhwan Kim (Sungkyunkwan University), Seunghyun Kim (Sungkyunkwan University), Yesin Ryu (Sungkyunkwan University), Saeid Gorgin (University of Hertfordshire), Jungrae Kim (Sungkyunkwan University)

11:35 – 11:55
Patterns Behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference
Zhongkai Yu (University of California, San Diego), Yue Guan (University of California, San Diego), Zihao Yu (Indiana University, Bloomington), Chenyang Zhou (Columbia University), Zhengding Hu (University of California, San Diego), Shuyi Pei (Samsung Semiconductor, Inc.), Yangwook Kang (Samsung Semiconductor, Inc.), Yufei Ding (University of California, San Diego), Po-An Tsai (NVIDIA)
11:55
12:15
Session Chair: TBD
12:15 – 12:35
Cassandra: Enabling Reasoning LLMs at Edge via Self-Speculative Decoding
Soongyu Choi (KAIST), Yuntae Kim (KAIST), Muyoung Son (KAIST), Joo-Young Kim (KAIST)

12:35 – 12:55
Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
Haoran Wu (University of Cambridge), Can Xiao (Imperial College London), Jiayi Nie (University of Cambridge), Xuan Guo (Imperial College London), Binglei Lou (Imperial College London), Jeffrey T.H. Wong (Imperial College London), Zhiwen Mo (Imperial College London), Cheng Zhang (Imperial College London), Przemyslaw Forys (Imperial College London), Chengyang Ai (University of Edinburgh), Timi Adeniran (University of Cambridge), Wayne Luk (Imperial College London), Hongxiang Fan (Imperial College London), Jianyi Cheng (University of Edinburgh), Timothy M. Jones (University of Cambridge), Rika Antonova (University of Cambridge), Robert Mullins (University of Cambridge), Aaron Zhao (Imperial College London)

12:55 – 13:15
Approaching Shannon Bound with Lossless LLM Weight Compression
Hongshi Tan (National University of Singapore), Yao CHEN (National University of Singapore), Gustavo Alonso (ETH Zurich), Weng-Fai Wong (National University of Singapore), Bingsheng He (National University of Singapore)
Session Chair: TBD
12:15 – 12:35
ECC Enabled Reliable and Performant Processing-in-Memory
Jeageun Jung (The University of Texas at Austin), Margaret Lee (The University of Texas at Austin), Mattan Erez (The University of Texas at Austin)

12:35 – 12:55
HBM-CASO: A Coordinated Approach to HBM System-Level and On-Die ECC
Ruizhi Zhu (University of Central Florida), Yanan Guo (University of Rochester), Huize Li (University of Central Florida), Weidong Cao (George Washington University), Qian Lou (University of Central Florida), Xin Xin (University of Central Florida)

12:55 – 13:15
ATX: Accelerator Task Extensions
Gerasimos Gerogiannis (University of Illinois at Urbana-Champaign), Stijn Eyerman (Intel), Josep Torrellas (University of Illinois at Urbana-Champaign), Wim Heirman (Intel)
Session Chair: TBD
12:15 – 12:35
RHODES: Robust Optimization for Uncertainty-Aware Design of CO2-Efficient Computing Systems
Mariam Elgamal (Harvard University), Abdulrahman Mahmoud (MBZUAI), Gu-Yeon Wei (Harvard University), David Brooks (Harvard University), Gage Hills (Harvard University)

12:35 – 12:55
Rearchitecting the Datacenter Lifecycle for AI
Jovan Stojkovic (University of Texas at Austin), Chaojie Zhang (Microsoft Azure), Íñigo Goiri (Microsoft Azure), Ricardo Bianchini (Microsoft Azure)

12:55 – 13:15
CAPA: Manufacturing Carbon Estimation for Advanced-Packaged Architectures
Jingyang Liu (University of Toronto), Gwenith Bowker-Bafna (University of Toronto), Yuke Zhang (University of Toronto), Natalie Enright Jerger (University of Toronto)
Session Chair: TBD
12:15 – 12:35
Five-Minute Rule 40 Years Later: A First-Principles Revisit for Modern Memory Hierarchy
Tong Zhang (ScaleFlux), Vikram Sharma Mailthody (NVIDIA), Fei Sun (ScaleFlux), Linsen Ma (ScaleFlux), Chris J. Newburn (NVIDIA), Teresa Zhang (Stanford), Yang Liu (ScaleFlux), Jiangpeng Li (ScaleFlux), Hao Zhong (ScaleFlux), Wen-Mei W. Hwu (NVIDIA)

12:35 – 12:55
LOONG: Utilizing Long-Stride Reprogram to Enhance the Performance of SSDs
Congming Gao (Xiamen University), Jiancong Zheng (Xiamen University), Xufeng Yang (Xiamen University), Qiao Li (Mohamed bin Zayed University of Artificial Intelligence), Jian Chen (Tsinghua University), Tianyu Ren (Tsinghua University), Zheng Wan (Xiamen University), Yina Lv (Xiamen University), Xin Xin (University of Central Florida), Min Ye (City University of Hong Kong), Chun Jason Xue (Mohamed bin Zayed University of Artificial Intelligence), Jiwu Shu (Xiamen University)

12:55 – 13:15
Don't Surrender to Low QPS/$: Fast and Cost-Efficient ANNS with TridentANN
Yuchen Huang (East China Normal University), Baiteng Ma (East China Normal University), Erci Xu (Shanghai Jiaotong University), Chuliang Weng (East China Normal University)
13:15 EDT
14:30
Session Chair: TBD
14:30 – 14:50
Shining Light on Silicon Photonic DNN Accelerators
Avilash Mukherjee (University of British Columbia), Mieszko Lis (University of British Columbia), Sudip Shekhar (University of British Columbia)

14:50 – 15:10
TensorPrism: Rethinking Sparse High-order Tensor Acceleration via Co-occurrence Graph
Fangzhou Ye (University of Central Florida), Shilin Tian (University of Central Florida), Amir Ghazizadeh Ahsaei (University of Central Florida), Hao Zheng (University of Central Florida)

15:10 – 15:30
OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration
Xueying Wu (Duke University), Baijun Zhou (Duke University), Zhihui Gao (Duke University), Yuzhe Fu (Duke University), Qilin Zheng (Duke University), Yintao He (Duke University), Hai Li (Duke University)

15:30 – 15:50
Omni-LUT: Energy-Efficient LUT-based Accelerator with Hardware-Aware KV Cache Quantization
Cheng-Han Tsai (National Yang Ming Chiao Tung University), Kuan-Chen Chou (National Yang Ming Chiao Tung University), Yu-Hsin Wang (National Yang Ming Chiao Tung University), Chieh-Dun Wen (National Yang Ming Chiao Tung University), Tsung Tai Yeh (National Yang Ming Chiao Tung University)

15:50 – 16:10
QiMeng-Tensify: Scaling up Tensor Computation Optimization via Architecture-Aware LLM-Guided MCTS
Shouyang Dong (University of Science and Technology of China), Jun Bi (ICT, CAS), Yuanbo Wen (Institute of Computing Technology, Chinese Academy of Sciences), Xiyue Yu (University of Science and Technology of China), Jianxing Xu (School of Computer Science and Technology, University of Science and Technology of China), Guanglin Xu (Institute of Computing Technology, Chinese Academy of Sciences) Ling Li (Institute of Software, Chinese Academy of Sciences), Xuehai Zhou (University of Science and Technology of China), Tianshi Chen (Cambricon Technologies), Qi Guo (ICT, CAS)
Session Chair: TBD
14:30 – 14:50
Taking Analytic Databases to the Bank
Alexandar Devic (Pennsylvania State University), Martin Prammer (Carnegie Mellon University), Kevin Gaffney (Microsoft), Siddhartha Balakrishna Rai (AMD), Anand Sivasubramaniam (Pennsylvania State University), Jignesh Patel (Carnegie Mellon University), Ameen Akel (Micron Technology)

14:50 – 15:10
Altering Processing-using-DRAM Operation Results Without Changing Operands: A Study of Real DRAM Chips and Implications for Future Systems
Daichi Tokuda (The University of Tokyo), Ismail Emir Yuksel (ETH Zurich), Tatsuya Kubo (The University of Tokyo), Ataberk Olgun (ETH Zurich), Haocong Luo (ETH Zurich), Nisa Bostanci (ETH Zurich), Jikun Wang (ETH Zurich), Abdullah Giray Yağlıkçı (CISPA), Shinya Takamaeda-Yamazaki (The University of Tokyo / RIKEN), Onur Mutlu (ETH Zurich)

15:10 – 15:30
MERIDIAN: In-Memory Acceleration for RAG with Document Attention Decomposition
Chaoqiang Liu (Huazhong University of Science and Technology), Yu Huang (Huazhong University of Science and Technology), Haifeng Liu (Huazhong University of Science and Technology), Yi Zhang (Huazhong University of Science and Technology), Qihang Qiu (Huazhong University of Science and Technology), Xueqi Li (Institute of Computing Technology, Chinese Academy of Sciences), Long Zheng (Huazhong University of Science and Technology), Xiaofei Liao (Huazhong University of Science and Technology), Hai Jin (Huazhong University of Science and Technology), Jingling Xue (UNSW Sydney)

15:30 – 15:50
PipeIMC: a Pipelined In-SRAM Computing Architecture
Yikai Cui (Department of Computer Science and Technology, Tsinghua University), Renhao Fan (Department of Computer Science and Technology, Tsinghua University), Weike Li (Department of Computer Science and Technology, Tsinghua University), Mingzhao Li (Suzhou Taihao HuiXin Microelectronics Co., Ltd.), Mingyu Wang (School of Microelectronics Science and Technology, Sun Yat-Sen University), Zhaolin Li (Department of Computer Science and Technology, Tsinghua University)

15:50 – 16:10
BAAP: Coupling Compute-in-SRAM with DRAM Banks for Near-Memory Processing
Cecilio C. Tamarit (Cornell University), Socrates Wong (Cornell University), Akshati Vaishnav (Cornell University), José Martínez (Cornell University)
Session Chair: TBD
14:30 – 14:50
Towards Practical Interrupt Side-Channel Attacks on macOS for Apple Silicon
Xin Zhang (Peking University), Chang Liu (Tsinghua University), Jiajun Zou (Peking University), Yi Yang (Peking University), Qingni Shen (Peking University), Zhi Zhang (The University of Western Australia), Trevor E. Carlson (National University of Singapore (NUS))

14:50 – 15:10
Helium: Quantifying Microarchitectural Side-Channel Leakage with Probabilistic Guarantees
Samantha Archer (Stanford University), Mohammad Rahmani Fadiheh (LUBIS EDA, RPTU Kaiserslautern), Caroline Trippel (Stanford University)

15:10 – 15:30
LÆGIS: Pinpointing and Addressing Performance Overheads of GPU-based Confidential Computing
Yang Yang (University of Virginia), Adwait Jog (University of Virginia)

15:30 – 15:50
MC-ORAM: A Mask-Assisted and Counter-Based Non-Deterministic ORAM inside VM-Based TEEs
Yongqin Wang (University of Southern California), Rachit Rajat (University of Southern California), Jonghyun Lee (University of Southern California), Mengyuan Li (USC), Murali Annavaram (University of Southern California)

15:50 – 16:10
TimeGaps Channels: Exploiting CPU Halted Time for Fun and Profit
Yusi Feng (Southern University of Science and Technology), Xin Zhang (Peking University), Sioli O'Connell (University of Adelaide), Liangwei Qiu (Southern University of Science and Technology), Chitchanok Chuengsatiansup (Hasso-Plattner-Institut and University of Potsdam), Daniel Genkin (Georgia Tech), Yuval Yarom (Ruhr University Bochum), Yinqian Zhang (Southern University of Science and Technology), Zhi Zhang (The University of Western Australia)
Session Chair: TBD
14:30 – 14:50
Dorado: Clustered Hardware Cache Coherence for 1,000+ Cores
Jovan Stojkovic (University of Illinois Urbana-Champaign), Abraham Farrell (University of Illinois Urbana-Champaign), Gerasimos Gerogiannis (University of Illinois Urbana-Champaign), Zhangxiaowen Gong (Intel), Christopher J. Hughes (Intel), Josep Torrellas (University of Illinois Urbana-Champaign)

14:50 – 15:10
Hierarchical Wakeup Logic of the Issue Queue for High Scalability
Hideki Ando (Nagoya University), Hajime Shimada (Nagoya University)

15:10 – 15:30
RUNLTS: Branch Prediction with Register-Value Correlations and Hierarchical Table Orchestration
Toru Koizumi (Nagoya Institute of Technology), Toshiki Maekawa (Nagoya Institute of Technology), Masanari Mizuno (Nagoya Institute of Technology), Maru Kuroki (The University of Tokyo), Tomoaki Tsumura (Nagoya Institute of Technology), Ryota Shioya (The University of Tokyo)

15:30 – 15:50
Augmenting the Branch Predictor with a Squashed-Branch Reuse Buffer
Rohit Singh (North Carolina State University), Jiayang Li (North Carolina State University), Eric Rotenberg (North Carolina State University)

15:50 – 16:10
Revisiting Global Value Prediction: A Resurgent Complement to Local Predictors
Ling Yang (National University of Defense Technology), Libo Huang (National University of Defense Technology), Zhong Zheng (National University of Defense Technology), Bingcai Sui (National University of Defense Technology), Sheng Ma (National University of Defense Technology), Yongwen Wang (National University of Defense Technology), Li Shen (National University of Defense Technology), Junhui Wang (National University of Defense Technology), Gang Chen (Sun Yat-Sen University), Qianming Yang (National University of Defense Technology), Songwen Pei (Shanghai Institute of Technology), Weixia Xu (National University of Defense Technology)
16:10
16:30
Session Chair: TBD
16:30 – 16:50
HybridSpec: Exploiting Hybrid-bonding Memory to Accelerate LLM Serving through Heterogeneous Architecture and Speculative Decoding
Zongle Huang (Tsinghua University), Wenbin Jia (Tsinghua University), Yaolei Li (Tsinghua University), Xinyuan Lin (Tsinghua University), Shupei Fan (Tsinghua University), Chen Tang (Tsinghua University), Shuwen Deng (Tsinghua University), Yongpan Liu (Tsinghua University)

16:50 – 17:10
P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats
Yuzong Chen (Cornell University), Chao Fang (KU Leuven), Xilai Dai (Cornell University), Yuheng Wu (Stanford University), Thierry Tambe (Stanford University), Marian Verhelst (KU Leuven), Mohamed Abdelfattah (Cornell University)

17:10 – 17:30
CHIME: A Case for Efficient Long-Context Attention-FC Disaggregated Inference with DIMM-PIM
Qingyuan Liu (Shanghai Jiao Tong University), Liyan Chen (Shanghai Jiao Tong University), Haocheng Wang (Shanghai Jiao Tong Univeristy), Yanning Yang (Shanghai Jiao Tong University), Dong Du (Shanghai Jiao Tong University), Zhigang Mao (Shanghai JiaoTong university), Naifeng Jing (Shanghai Jiao Tong University), Yubin Xia (Shanghai Jiao Tong University), Haibo Chen (Shanghai Jiao Tong University)

17:30 – 17:50
SMOOTH: Hardware-Assisted Fine-Grained On-Chip Memory Management for Efficient On-Device LLM Inference
Seulki Kim (DGIST), Bokyeong Kim (Samsung Research), Kyeonghyeon Ryu (DGIST), Yeji Jung (DGIST), Hwanjun Lee (DGIST), Sungju Kim (Yonsei University), Yunhyeong Jeon (DGIST), Daehoon Kim (Yonsei University)

17:50 – 18:10
SHyLA: 3D-Stacked NVM-DRAM Hybrid LLM-Inference Architecture Exploiting Data and Memory Heterogeneity
Liu He (Tsinghua University), Fuyao Zhou (Tsinghua University), Cheng Peng (Tsinghua University), Shunan Dong (Tsinghua University), Ziming Zhang (Tsinghua University), Huazhong Yang (Tsinghua University), Yongpan Liu (Tsinghua University), Guowei Zhang (HiSilicon Technologies Co., Ltd.), Hongyang Jia (Tsinghua University)
Session Chair: TBD
16:30 – 16:50
SPEC CPU: The Next Generation
Mahesh Madhav (Ampere Computing), Allen Lee (IEIT), Andres Mejia (Intel), Branden Moore (AMD), Charan Soppadandi (Dell Technologies), Chris Cambly (IBM), Christoph Müllner (VRULL), Daniel Bowers (SPEC), David Reiner (AMD), Denis Bakhvalov (Rivos), Di Zhao (Ampere Computing), Duane Voth (AMD), Feng Xue (Ampere Computing), Frédérique Silber-Chaussumier (ARM), James Bucek (SPEC), James Southern (HPE), Jiangning Liu (Ampere Computing), Jim Himer (SPEC), John Henning (SPEC), Kevin Smith (Ampere Computing), Kristen Yang (AMD), Kunal Kashyap (AMD), Mason Guy (Intel), Mat Colgrove (NVIDIA), Michael Berg (SiFive), Prasad Battini (Intel), Prasad Joshi (Intel), Rohit Prasad (Intel), Shayantika Bhattacharya (AMD), Sriyash Caculo (Ampere Computing), Stefan Reimbold (IBM), Sundar Iyengar (Intel), Van Smith (AMD), Zarko Todorovski (IBM)

16:50 – 17:10
A Silicon-Proven Unified Low-Latency CXL Controller and Port-Based Routing Switch for Memory-Centric Fabrics
Miryeong Kwon (Panmnesia, Inc.), Seungjun Lee (Panmnesia, Inc.), Donghyun Gouk (Panmnesia, Inc.), Hongjoo Jung (Panmnesia, Inc.), Eojin Ryu (Panmnesia, Inc.), Seyeong Huh (Panmnesia, Inc.), Junseok Moon (Panmnesia, Inc.), Hyein Woo (Panmnesia, Inc.), Junhee Kim (Panmnesia, Inc.), Kyungkuk Nam (Panmnesia, Inc.), Jinwoo Baek (Panmnesia, Inc.), Hyunkyu Choi (Panmnesia, Inc.), Woojin Choi (Panmnesia, Inc.), Yongjin Cho (Panmnesia, Inc.), Myoungsoo Jung (Panmnesia, Inc.)

17:10 – 17:30
Vistara: Making CXL Real—Full Path from ASIC Design and OS Support to Hyperscale Deployment
Neha Gholkar (Meta Platforms), Jovan Stojkovic (Meta Platforms and UT Austin), Hasan Al Maruf (Meta Platforms), Gregory Price (Meta Platforms), Prakash Chauhan (Meta Platforms), Hiral Patel (Meta Platforms), Cedric Van Goethem (Meta Platforms), Kiran Vemuri (Meta Platforms), Kiran Malwankar (Meta Platforms), Kishore Sriadibhatla (Meta Platforms), Kalyan Subramanian (Meta Platforms), Shobhit Kanaujia (Meta Platforms), Chunqiang Tang (Meta Platforms), Abhishek Dhanotia (Meta Platforms)

17:30 – 17:50
From Lab to Fleet: Building and Deploying a Practical Rowhammer Defense in Cloud SoCs
Stefan Saroiu (Microsoft), Sujay Yadalam (UT Austin), Alec Wolman (Microsoft), Will Remaklus (Microsoft), Daniel Berger (Microsoft), Isaac Hernandez Luna (Microsoft), Ishwar Agarwal (Meta), Jacob R. Lorch (Microsoft)

17:50 – 18:10
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
Gang Liao (Meta), Hongsen Qin (Meta), Ying Wang (Meta), Alicia Golden (Meta Platforms, Inc.), Michael Kuchnik (Meta), Yavuz Yetim (Meta), Ruichao Xiao (Meta), Jia Jiunn Ang (Meta), Chunli Fu (Meta), Yihan He (Meta), Samuel Hsia (Meta), Zewei Jiang (Meta), Roman Levenstein (Meta), Dianshi Li (Meta), Liyuan Li (Meta), Ajit Mathews (Meta), Varna Puvvada (Meta), Feng Shi (Meta), Nathan Yan (Meta), Xiayu Yu (Meta), Uladzimir Pashkevich (Meta), Matt Steiner (Meta), Carole-Jean Wu (Meta), Gaoxiang Liu (Meta)
Session Chair: TBD
16:30 – 16:50
SwiftGraph: A Domain-Specific Language for Easy and Efficient Graph Accelerator Design
Feiyang Wu (Peking University), Xuxiao Yang (Peking University), Zhuohang Bian (Peking University), Jing Wang (Shanghai Jiao Tong University), Ruifan Xu (Peking University), Guangyu Sun (Peking University), Yun Liang (Peking University), Youwei Zhuo (Peking University)

16:50 – 17:10
Accelerator Polymorphism: Transcending Domain-Specific Architectures with Robotics
Hanyang Xu (University of California San Diego), Seongryong Oh (KAIST), Yubin Lee (KAIST), Ashwin Rohit Alagiri Rajan (University of California San Diego), Rohan Mahapatra (University of California San Diego), Om Patil (University of California San Diego), Yuchuan Li (University of California San Diego), Jongse Park (KAIST), Hadi Esmaeilzadeh (University of California San Diego)

17:10 – 17:30
GRAINS: Enabling High-Performance and Low-Cost Graph-Based Genome Analysis via Storage-Aware Algorithm-Architecture Co-Design
Nika Mansouri Ghiasi (ETH Zurich), Harun Mustafa (ETH Zurich, Johns Hopkins University), Talu Güloglu (ETH Zurich), Rakesh Nadig (ETH Zurich), Konstantina Koliogeorgi (ETH Zurich), Susana Rebolledo Ruiz (ETH Zurich, University of Cantabria), Marc Rautmann (ETH Zurich), Furkan Eris (ETH Zurich), Mohammad Sadrosadati (ETH Zürich), Jisung Park (POSTECH (Pohang University of Science and Technology)), Onur Mutlu (ETH Zurich)

17:30 – 17:50
Lembas: An Appliance for Scalable Genome Alignment
Seongyoung Kang (University of California, Irvine), Se-Min Lim (Kookmin University), Sang-Woo Jun (University of California, Irvine)

17:50 – 18:10
LoRA: Towards Improved Applicability of Reconfigurable Architecture for Versatile Nonlinear Functions
Yuan Dai (State Key Laboratory of Integrated Chips and Systems, Fudan University), Guibin Zou (State Key Laboratory of Integrated Chips and Systems, Fudan University), Yuanda Yang (State Key Laboratory of Integrated Chips and Systems, Fudan University), Huan Lin (State Key Laboratory of Integrated Chips and Systems, Fudan University), Jiahang Lou (State Key Laboratory of Integrated Chips and Systems, Fudan University), Yiwen Luo (State Key Laboratory of Integrated Chips and Systems, Fudan University), Xinyu Cai (State Key Laboratory of Integrated Chips and Systems, Fudan University), Wenbo Yin (State Key Laboratory of Integrated Chips and Systems, Fudan University), Wai-Shing Luk (State Key Laboratory of Integrated Chips and Systems, Fudan University), Lingli Wang (State Key Laboratory of Integrated Chips and Systems, Fudan University)
Session Chair: TBD
16:30 – 16:50
Triage: An Adaptive Parallel Window Decoding Scheduler for Real-time Fault-Tolerant Quantum Computation
Jiahan Chen (Guangzhou)), Chenghong Zhu (Guangzhou)), Ge Bai (Guangzhou)), Xin Wang (The Hong Kong University of Science and Technology (Guangzhou))

16:50 – 17:10
Coset Ensemble Decoder for Quantum Error Correction with Algorithm-Hardware Co-Design
Shuang Liang (Imperial College London), Jubo Xu (Imperial College London), Giulio Bassanino (Imperial College London), Qianzhou Wang (Imperial College London), Yidong Zhou (Rutgers University), Yuncheng Lu (Imperial College London), Zhiwen Mo (Imperial College London), Paul Kelly (Imperial College London), Bo Yuan (Rutgers University), Wayne Luk (Imperial College London), Hongxiang Fan (Imperial College London)

17:10 – 17:30
A Streaming Architecture for Quantum Error Syndrome Compression at 4 Kelvin
Panagiotis Papanikolaou (University of Wisconsin-Madison), Ryan Hou (University of Wisconsin-Madison), Jennifer Volk (University of Wisconsin-Madison), George Tzimpragos (University of Wisconsin-Madison)

17:30 – 17:50
Transpiler-Architecture Co-Design to Curb Clifford Costs in Fault-Tolerant Quantum Computing
Meng Wang (The University of British Columbia), Chenxu Liu (Pacific Northwest National Laboratory), Samuel Stein (Pacific Northwest National Laboratory), Yufei Ding (University of California San Diego), Poulami Das (University of Texas at Austin), Prashant Nair (The University of British Columbia), Ang Li (PNNL and University of Washington)

17:50 – 18:10
Kernpiler: Compiler Optimization for Quantum Hamiltonian Simulation with Partial Trotterization
Ethan Decker (University of Pennsylvania), Lucas Goetz (ETH Zurich), Evan McKinney (Yale University), Erik Gustafson (USRA), Junyu Zhou (University of Pennsylvania), Yuhao Liu (University of Pennsylvania), Alex Jones (Syracuse University), Ang Li (PNNL and UW), Alexander Schuckert (Joint Quantum Institute, UMD), Samuel Stein (Pacific Northwest National Laboratory), Eleanor Crane (MIT), Gushu Li (University of Pennsylvania)
18:10 EDT
Evening

Tuesday, June 30

8:30 EDT
Fred Chong (University of Chicago, Infleqtion); Jay Gambetta (IBM)
Moderator: Margaret Martonosi
Abstract

Quantum computing technologies have advanced dramatically in the past decade. With fault-tolerant machines on the horizon and near-term machines showing improved applications through integration with classical high-performance computing, the future is clear: computing will be heterogeneous and accelerator-based. Much work remains to be done to achieve this vision. Architects are needed to bridge the gap between theory and physical technologies. A systems view will be essential in the design of applications, software, error-correction protocols, workflow management, and machine organization. In this dialog, we will discuss these design challenges and how techniques from the architecture community can help address them.


Bios

Fred Chong is the Seymour Goodman Professor in the Department of Computer Science at the University of Chicago and the Chief Scientist for Quantum Software at Infleqtion. He was the Lead Principal Investigator for the EPiQC Project (Enabling Practical-scale Quantum Computing), an NSF Expedition in Computing from 2018-2024. Chong was a member of the National Quantum Advisory Committee (NQIAC) from 2020-2025, which provided advice to the President on the National Quantum Initiative Program. In 2020, he co-founded Super.tech, a quantum software company, which was acquired by Infleqtion (formerly ColdQuanta) in 2022.

Chong received his Ph.D. in EECS from MIT in 1996 and was a faculty member and Chancellor's fellow at UC Davis from 1997-2005. He was also a Professor of Computer Science, Director of Computer Engineering, and Director of the Greenscale Center for Energy-Efficient Computing at UCSB from 2005-2015. He is a fellow of the ACM and the IEEE, a recipient of the NSF CAREER award, the Intel Outstanding Researcher Award, and 17 best paper awards. He is also a recipient of the Quantrell Award, the oldest undergraduate teaching award in the United States, as well as the University of Chicago's Graduate Teaching and Mentoring Award.

Dr. Jay M. Gambetta is Director of Research and IBM Fellow. In this role, he leads the company's global research initiatives, spearheading the company's strategy and vision to develop the future of computing, which includes artificial intelligence, semiconductors, and quantum computing.

Previously, Jay served as Vice President of IBM Quantum and was named an IBM Fellow in 2018 for his leadership in advancing superconducting quantum computing and establishing IBM's quantum strategy to bring quantum computing to the world. Jay has been at the forefront of quantum information science for over two decades, pioneering both the theoretical foundations and practical engineering of quantum systems.

Under his direction, IBM was first to demonstrate a cloud-based quantum computing platform, which has since grown into the world's most widely used quantum computing service. He also spearheaded the development of Qiskit, the preeminent open-source quantum software development kit, enabling a global community to build, optimize, and execute quantum circuits on multiple hardware platforms. Jay's vision has shaped IBM's quantum roadmap and fostered a thriving ecosystem with more than 600,000 registered users, over 3 trillion quantum circuits executed, and a network of 280+ academic, industry, and government partners.

With more than 130 scientific publications and 50,000+ citations, his research has advanced quantum error correction, superconducting qubits, quantum algorithms, and scalable quantum architecture. He is also a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and the American Physical Society (APS). Jay received his Ph.D. in Physics from Griffith University in Australia.

10:00
10:20
Session Chair: TBD
10:20 – 10:40
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
Qijun Zhang (Hong Kong University of Science and Technology), Chen Zhang (Shanghai Jiao Tong University), Zhuoshan Zhou (Shanghai Jiao Tong University), Haibo Wang (Huawei), Zhe Zhou (Huawei), Zhipeng Tu (Huawei), Guangyu Sun (Peking University), Zhiyao Xie (Hong Kong University of Science and Technology), Yijia Diao (Shanghai Jiao Tong University), Zhigang Ji (Shanghai Jiao Tong University), Jingwen Leng (Shanghai Jiao Tong University), Guanghui He (Shanghai Jiao Tong University), Minyi Guo (Shanghai Jiao Tong University)

10:40 – 11:00
ConServe: Contiguity-Preserving Memory Management for Multi-Turn LLM Serving
Bingyao Li (University of California, Riverside)

11:00 – 11:20
Mapping and Communication Optimizations with Fault Tolerance for Wafer-Scale LLM Inference
Junwei Cui (The Hong Kong University of Science and Technology (Guangzhou)), Le Qin (The Hong Kong University of Science and Technology (Guangzhou)), Weilin Cai (The Hong Kong University of Science and Technology (Guangzhou)), Jiayi Huang (The Hong Kong University of Science and Technology (Guangzhou))

11:20 – 11:40
DynoPipe: Heterogeneous Edge-Cloud LLM Serving with Dynamically Orchestrated Pipeline Boundaries
Yanying Lin (University of Chinese Academy of Sciences, UCSD), Baicheng Chen (University of California San Diego), Xinyu Zhang (University of California San Diego), Chengzhong Xu (University of Macau), Kejiang Ye (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)

11:40 – 12:00
DIAMoND: Dynamic Inference for Adaptive Edge MoE with Heterogeneous In-NAND and Near-DRAM Compute Architecture
Ling Liang (Peking University), Tianyang Luo (Peking University), Shuzhang Zhong (Peking University), Dongxue Zhao (Peking University), Qichao Ma (Peking University), Renjie Wei (Peking University), Jingyu Wang (Xiaomi Corporation), Meng Li (Peking University), Guangyu Sun (Peking University), Zongwei Wang (Peking University), Yimao Cai (Peking University)

12:00 – 12:20
SingularBit: Exploiting Synergy of Singular Value Decomposition and Low-Bit Quantization for Weight and KV Compression in LLM Inference
Seongyon Hong (KAIST), Hyundeok Kong (KAIST), Jungwan Lee (KAIST), Sangjin Kim (KAIST), Hoi-Jun Yoo (KAIST)
Session Chair: TBD
10:20 – 10:40
Optimized Memory Tagging on AmpereOne® Processors
Shivnandan Kaushik (Ampere Computing), Mahesh Madhav (Ampere Computing), Nagi Aboulenein (Ampere Computing), Jason Bessette (Ampere Computing), Sandeep Brahmadathan (Ampere Computing), Benjamin Chaffin (Ampere Computing), Matthew Erler (Ampere Computing), Stephan Jourdan (Ampere Computing), Thomas Maciukenas (Ampere Computing), Ramya Jayaram Masti (Ampere Computing), Jon Perry (Ampere Computing), Massimo Sutera (Ampere Computing), Scott Tetrick (Ampere Computing), Bret Toll (Ampere Computing), David Turley (Ampere Computing), Carl Worth (Ampere Computing), Atiq Bajwa (Ampere Computing)

10:40 – 11:00
BoostX™-NTI: Fast, Scalable and Flexible Storage Architecture with NVMe/TCP Initiator Acceleration
Hamin Jang (MangoBoost Inc. & Seoul National University), Jinha Jeong (MangoBoost Inc. & Seoul National University), Wonseok Lee (MangoBoost Inc. & Seoul National University), Jun Heo (MangoBoost Inc.), Jongcheon Lee (MangoBoost Inc.), Hyunjae Chu (MangoBoost Inc. & Seoul National University), Taehyun Kim (MangoBoost Inc.), Youngwoo Jeong (MangoBoost Inc.), Dongu Kim (MangoBoost Inc.), Heetaek Jeong (MangoBoost Inc. & Seoul National University), Changsu Kim (MangoBoost Inc.), Dongup Kwon (MangoBoost Inc.), Jangwoo Kim (MangoBoost Inc.)

11:00 – 11:20
M100: An Orchestrated Dataflow Architecture Powering General AI Computing
Yan Xie (Li Auto), Changkui Mao (Li Auto), Changsong Wu (Li Auto), Chao Lu (Li Auto), Chao Suo (Li Auto), Cheng Qian (Li Auto), Chun Yang (Li Auto), Danyang Zhu (Li Auto), Hengchang Xiong (Li Auto), Hongzhan Lu (Li Auto), Hongzhen Liu (Li Auto), Jiafu Liu (Li Auto), Jie Chen (Li Auto), Jie Dai (Li Auto), Junfeng Tang (Li Auto), Kai Liu (Li Auto), Kun Li (Li Auto), Lipeng Ge (Li Auto), Meng Sun (Li Auto), Min Luo (Li Auto), Peng Chen (Li Auto), Peng Wang (Li Auto), Shaodong Yang (Li Auto), Shibin Tang (Li Auto), Shibo Chen (Li Auto), Weikang Zhang (Li Auto), Xiao Ling (Li Auto), Xiaobo Du (Li Auto), Xin Wu (Li Auto), Yang Liu (Li Auto), Yi Jiang (Li Auto), Yihua Jin (Li Auto), Yin Huang (Li Auto), Yuli Zhang (Li Auto), Zhen Yuan (Li Auto), Zhiyuan Man (Li Auto), Zhongxiao Yao (Li Auto)

11:20 – 11:40
Prometheus: Toward Resilient Datacenters through Optimized Cooling Infrastructure
Sourav Patel (Google), Pratyush Kumar (Google), Youzhi Liang (Google), Thomas Kowalski (Google), Nick Care (Google), Wenjie Dong (Google), Greg Imwalle (Google), Rita Lu (Google), Jeremy Rice (Google), Nick Saddock (Google), Rick Vengalath (Google), Urs Hoelzle (Google), Benjamin C. Lee (Google, University of Pennsylvania), Parthasarathy Ranganathan (Google)

11:40 – 12:00
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
Moiz Arif (Micron Technology, Inc.), Avinash Maurya (Argonne National Laboratory), Sudharshan Vazhkudai (Micron Technology, Inc.), Bogdan Nicolae (Argonne National Laboratory)

12:00 – 12:20
MTIA-3: Meta’s First Training Chip Featuring Built-in NICs and Collective Offloading Engines
MTIA Team (Meta Platforms)
Session Chair: TBD
10:20 – 10:40
μRNG: A Framework for Assessing Randomness in Intermittent Computing Devices
Prakhar Sah (Virginia Tech), Matthew Hicks (Virginia Tech)

10:40 – 11:00
IroKnight: Ownership-Preserving Neural Acceleration for Inference Serving
Harsha Santhanam (UC San Diego), Ashwin Rohit Alagiri Rajan (UC San Diego), Hadi Esmaeilzadeh (UC San Diego)

11:00 – 11:20
Intermittence-aware Speculative Page Coloring for Secure NVM
Jongouk Choi (University of Central Florida), Junyeong Park (University of Central Florida), Nicholas L’Heureux (University of Central Florida), Yan Solihin (University of Central Florida), Hyunwoo Joe (Electronics and Telecommunications Research Institute (ETRI)), Changhee Jung (Purdue University)

11:20 – 11:40
AutoFHE: An Automatic Hardware Generation Framework for Domain-Specific FHE Accelerator
Yibo Du (1.University of Chinese Academy of Sciences. 2.Institute of Computing Technology, Chinese Academy of Sciences), Cangyuan Li (Institute of Computing Technology, Chinese Academy of Sciences), Bing Li (Institute of Microelectronics, Chinese Academy Sciences); Mengdi Wang (Institute of Computing Technology, Chinese Academy of Sciences), Lian Liu (1.University of Chinese Academy of Sciences. 2.Institute of Computing Technology, Chinese Academy of Sciences), Shixin Zhao (1.University of Chinese Academy of Sciences. 2.Institute of Computing Technology, Chinese Academy of Sciences), Yinhe Han (Institute of Computing Technology, Chinese Academy of Sciences), Ying Wang (Institute of Computing Technology, Chinese Academy of Sciences)

11:40 – 12:00
LIPPEN : A Lightweight In-Place Pointer Encryption Architecture for Pointer Integrity
Erfan Iravani (Virginia Tech), Lalit Prasad Peri (Virginia Tech), Mohannad Ismail (Virginia Tech), Charitha Tumkur Siddalingaradhya (Virginia Tech), Changwoo Min (Igalia), Elif Bilge Kavun (TU Dresden), Wenjie Xiong (Virginia Tech)

12:00 – 12:20
DarkStream: Exploiting Internal Throughput Contention in Data Streaming Accelerator for Timing Attacks
Hyosang Kim (DGIST), Ki-Dong Kang (ETRI), Gyeongseo Park (ETRI), Sungju Kim (Yonsei University), Daehoon Kim (Yonsei University)
Session Chair: TBD
10:20 – 10:40
L-PCN: A Point Cloud Accelerator Exploiting Spatial Locality through Octree-based Islandization
Yiming Gao (1. Nanjing University of Posts and Telecommunications 2. University of Florida), Jieming Yin (Nanjing University of Posts and Telecommunications), Yuxiang Wang (Nanjing University of Posts and Telecommunications), Xiangru Chen (University of Florida), Zhilei Chai (Jiangnan University), Bowen Jiang (Jiangnan University), Jiliang Zhang (Nanjing University of Posts and Telecommunications), Herman Lam (University of Florida)

10:40 – 11:00
NS-FPS: Accelerating Farthest Point Sampling via Neighbor Search in Large-Scale Point Clouds
Jiapei Zheng (Fudan University), Shuan Yang (Fudan University), Siqi He (Fudan University), Qi Liu (Fudan University), Chixiao Chen (Fudan University)

11:00 – 11:20
RoCC: Harnessing Raster Operations Pipeline for Efficient Tensor Collective Communication
Yuan Feng (University of California Merced), Daniel Wong (University of California, Riverside), Hyeran Jeon (University of California Merced)

11:20 – 11:40
STEP: Spatial Footprint Prefetcher with Multi-Point Temporal Triggers
Yuanji Ye (Technical University of Munich), Oliver Lenke (Technical University of Munich), Thomas Wild (Technical University of Munich), Andreas Herkersdorf (Technical University of Munich)

11:40 – 12:00
TDMSim: Enabling High-Density and Energy-Efficient GPU DRAM Caches with 2D-Materials for Data-Intensive Applications
Chao Fu (ShaoXin Laboratory, Fudan University), Jingyang Zheng (Fudan University), Xinliu He (Fudan University), Xiangqi Dong (Fudan University), Zheng Cao (Fudan University), Yuning Zhan (Fudan University), WenZhong Bao (Fudan University), Peng Zhou (Fudan University), Jun Han (Fudan University)

12:00 – 12:20
RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs
Hanum Ko (Sungkyunkwan University), Sangheum Yeon (Sungkyunkwan University), Jong Hwan Ko (Sungkyunkwan University), Jungrae Kim (Sungkyunkwan University)
12:20 EDT
14:15
Session Chair: TBD
14:15 – 14:35
Tetris: Efficient Long-context LLM Serving with Chunkwise Dynamic Sequence Parallelism
Cong Li (Peking University), Yuzhe Yang (ByteDance Seed), Xuegui Zheng (ByteDance Seed), Qifan Yang (ByteDance Seed), Yijin Guan (ByteDance), Size Zheng (ByteDance Seed), Li-Wen Chang (ByteDance Seed), Shufan Liu (ByteDance Seed), Xin Liu (ByteDance Seed), Guangyu Sun (Peking University)

14:35 – 14:55
SMoE: An Algorithm-System Co-Design for Pushing MoE to the Edge via Expert Substitution
Guoying Zhu (Nanjing University), Meng Li (Nanjing University), Haipeng Dai (Nanjing University), Xuechen Liu (Nanjing University), Weijun Wang (Tsinghua University), Keran Li (Nanjing University), Jun Xiao (Honor Device Co.,Ltd), Ligeng Chen (Honor Device Co., Ltd), Wei Wang (Nanjing University)

14:55 – 15:15
ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs
Jinwu Yang (Institute of Computing Technology, Chinese Academy of Sciences), Jiaan Wu (University of Chinese Academy of Sciences), Zedong Liu (Institute of Computing Technology, Chinese Academy of Sciences), Xinyang Ma (University of Chinese Academy of Sciences), Hairui Zhao (Jilin University), Yida Gu (Institute of Computing Technology, Chinese Academy of Sciences), Yuanhong Huang (Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences), Xingchen Liu (Institute of Computing Technology, Chinese Academy of Sciences), Wenjing Huang (University of Chinese Academy of Sciences), Zheng Wei (Institute of Computing Technology, Chinese Academy of Sciences), Jing Xing (State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences), Yili Ma (Institute of Computing Technology, Chinese Academy of Sciences), Qingyi Zhang (Huawei Cloud, Huawei Technologies Company Ltd.), Baoyi An (Network Technology Lab, Huawei Technologies CO., LTD), Zhongzhe Hu (Huawei Noah’s Ark Lab, Huawei Technologies Company Ltd.), Shaoteng Liu (Network Technology Laboratory, Huawei Technologies Company Ltd.), Xia Zhu (Huawei Technologies Co., Ltd), Jiaxun Lu (Network Routing Technology Laboratory, Huawei Technologies Company Ltd.), Guangming Tan (Chinese Academy of Sciences), Dingwen Tao (Institute of Computing Technology, Chinese Academy of Sciences)

15:15 – 15:35
STEP: Adaptive Spatio-Temporal Expert Prefetching for Low-Latency and Memory-Efficient MoE Inference
Fangxin Liu (Shanghai Jiao Tong University), Ning Yang (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Chenyang Guan (Shanghai Jiao Tong University), Haomin Li (Shanghai Jiao Tong University), Yu Feng (Shanghai Jiao Tong University), Liqiang Lu (Zhejiang University), Xiang Li (Alibaba Group), Siran Yang (Alibaba Group), Jiamang Wang (Alibaba Group), Lin Qu (Alibaba Group), Li Jiang (Shanghai Jiaotong University), Haibing Guan (Shanghai Jiao Tong University)

15:35 – 15:55
EVA: Accelerating LLM Decoding via an Efficient Vector Quantization Architecture
Bowen Duan (Duke University), Cong Guo (Duke University), Chiyue Wei (Duke University), Haoxuan Shan (Duke University), Yuzhe Fu (Duke University), Xinhua Chen (Duke University), Yifan Xu (Duke University), Ziyue Zhang (Duke University), Changchun Zhou (Duke University), Hai Li (Duke University), Yiran Chen (Duke University)
Session Chair: TBD
14:15 – 14:35
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
Liang Luo (Meta AI), Yinbin Ma (Meta AI), Quanyu Zhu (Meta AI), Vasiliy Kuznetsov (Meta AI), Yuxin Chen (Meta AI), Jian Jiao (Meta AI), Jiecao Yu (Meta AI), Buyun Zhang (Meta AI), Tongyi Tang (Meta AI), Xiaohan Wei (Meta AI), Yanli Zhao (Meta AI), Zeliang Chen (Meta AI), Yuchen Hao (Meta AI), Venkatesh Ranganathan (Meta AI), Sandeep Parab (Meta AI), Yantao Yao (Meta AI), Maxim Naumov (Meta AI), Chunzhi Yang (Meta AI), Shen Li (Meta AI), Ellie Wen (Meta AI), Wenlin Chen (Meta AI), Chunqiang Tang (Meta AI)

14:35 – 14:55
Bringing Near Data Processing into the Low-Bit Floating-Point Era
Tongxin Xie (Tsinghua University), Mingyu Gao (Tsinghua University), Zehao Wang (Tsinghua University), Zhihao Jia (Imperial College London), Yuechen Xi (Nankai University), Bing Li (Institute of Microelectronics, Chinese Academy of Sciences), Mo Guang (Li Auto), Jiale Yan (Li Auto), Kaiwen Long (Li Auto), Xingcheng Zhang (Shanghai AI Laboratory), Huazhong Yang (Tsinghua University), Yuan Xie (HKUST), Zhenhua Zhu (Tsinghua University, HKUST), Yu Wang (Tsinghua University)

14:55 – 15:15
NasZip: Software and Hardware Co-design to Accelerate Approximate Nearest Neighbor Search with DIMM-based Near-Data Processing
Cheng Zou (Shanghai Jiao Tong University), Shuo Yang (Shanghai Jiao Tong University), Chen Nie (Shanghai Jiao Tong University, Shanghai AI Laboratory), Yu Zou (Institute of Information Engineering, Chinese Academy of Sciences), Yu He (Lenovo Research), Chao Jiang (Lenovo Research), Limin Xiao (Lenovo Research), Weifeng Zhang (Lenovo Research), Zhezhi He (Shanghai Jiao Tong University, Shanghai AI Laboratory)

15:15 – 15:35
AQuant: Repurposing CODEC for VLM Acceleration via Adaptive Quantization
Zhuoran Song (Shanghai Jiao Tong University), Chunyu Qi (Shanghai Jiao Tong University), Jian Weng (KAUST), Xiaoyao Liang (SJTU), Haibing Guan (Shanghai Jiao Tong University)

15:35 – 15:55
Random-Access Hardware Sequence Compression
Nolan Chu (Virginia Tech), Yoon Lee (Virginia Tech), Gagandeep Panwar (AMD), Xun Jian (Virginia Tech)
Session Chair: TBD
14:15 – 14:35
Observability-aided GPU Memory Oversubscription
Pratheek B (Indian Institute of Science), Khushit Shaileshbhai Shah (Indian Institute of Science), Arkaprava Basu (Indian Institute of Science)

14:35 – 14:55
Coarse-Grained Duplication First, Fine-Grained Deduplication Later: Duplication-Centric Multi-GPU Memory Management
Xiangyue Huang (UC-Santa Cruz), Yanan Guo (University of Rochester), Yuanchao Xu (University of California, Santa Cruz)

14:55 – 15:15
Reducing Page Faults via Invalidation-based Mapping Propagation in Multi-GPU Systems
Junsung Kim (Yonsei University), Dongho Ha (Yonsei University), Sungwoo Kim (Yonsei University), Wonho Cho (Yonsei University), Sungbin Kim (Yonsei University), Yufei Ding (University of California San Diego), Won Woo Ro (Yonsei University)

15:15 – 15:35
sCROOGe: Circuit-level Design and Optimization Framework for RISC-V Out-of-Order GPUs
Maria Zerva (National Technical University of Athens), Panagiotis-Eleftherios Eleftherakis (National Technical University of Athens), Alexis Maras (National Technical University of Athens), Konstantinos Iliakis (National Technical University of Athens), Alexandros Moiras (National Technical University of Athens), Sotirios Xydis (National Technical University of Athens)

15:35 – 15:55
æSIP: μArch-aware ASIP-ISA Co-Design via Program Synthesis, Equality Saturation, and External Don't Cares
Haoran Jin (University of Michigan), Jirong Yang (University of Michigan), Barry Lyu (University of Michigan), Ruijie Gao (University of Michigan), Nathan Bleier (University of Michigan)
Session Chair: TBD
14:15 – 14:35
ColumnKeeper: Efficient Solutions for Mitigating ColumnDisturb in DRAM-based Systems
Andreas Kosmas Kakolyris (ETH Zurich), F. Nisa Bostanci (ETH Zurich), Ataberk Olgun (ETH Zurich), Ismail Emir Yuksel (ETH Zurich), Harsh Songara (ETH Zurich), Konstantinos Marios Sgouras (ETH Zurich), Umut Baser (TOBB ETU, ETH Zurich), Konstantinos Kanellopoulos (ETH Zurich), A. Giray Yaglikci (CISPA), Onur Mutlu (ETH Zurich)

14:35 – 14:55
PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting
Jumin Kim (Seoul National University), Seungmin Baek (Seoul National University), Hwayong Nam (Seoul National University), Minbok Wi (Samsung Electronics), Nam Sung Kim (University of Illinois, Urbana-Champaign), Jung Ho Ahn (Seoul National University)

14:55 – 15:15
Loaded Dice: Solving the Non-Selection Problem for Scalable Probabilistic RowHammer Defense
Jeonghyun Woo (The University of British Columbia), Junsu Kim (The University of British Columbia), Aamer Jaleel (NVIDIA), Prashant Nair (The University of British Columbia)

15:15 – 15:35
PRowhammer: Propagating Bit-flips from CPU to GPU
Mrityunjay Shukla (Indian Institute of Technology Bombay), Shubham Roy (Indian Institute of Technology Bombay), Sayandeep Saha (Indian Institute of Technology Bombay), Biswabandan Panda (Indian Institute of Technology Bombay)

15:35 – 15:55
DejaVu: Why You Should Write to Your DRAM Rows Twice
Haocong Luo (ETH Zurich), Ismail Emir Yuksel (ETH Zurich), Ataberk Olgun (ETH Zurich), Nisa Bostanci (ETH Zurich), Orhun Ecemiş (TOBB ETÜ), Abdullah Giray Yaglikci (CISPA), Onur Mutlu (ETH Zurich)
15:55
16:30

Abstract
TBD


Bio
TBD

18:00 EDT
Evening
TBD

Wednesday, July 1

8:30 EDT
Babak Falsaf (EPFL)
Abstract

AI is pushing cloud infrastructure toward an energy wall, where demand for compute is growing faster than the industry can sustainably provision power, cooling, and datacenter capacity. At the same time, in the post-Moore era, gains in silicon density and energy efficiency have fallen far short of what is needed to keep pace with AI's compute demand. This keynote challenges the long-standing assumption that single-thread performance should dominate server design and operation. I argue that moving beyond the AI energy wall requires full-stack optimization across hardware, systems, workloads, and operating constraints—not simply scaling accelerators or building larger facilities. A new golden age of server design has finally arrived: one in which computer architects can redefine how useful computing is delivered under real constraints of power, silicon, cooling, and scale.


Bio

Babak Falsafi is a Professor of Computer and Communication Sciences at EPFL. His contributions to computer systems include the first NUMA multiprocessors built by Sun Microsystems (WildFire/WildCat), spatial memory streaming in ARM A-72 cores onwards, temporal memory streaming in IBM BlueGene P/Q and ARM Neoverse N2 cores, and performance evaluation methodologies adopted by AMD, HP and Google. His work on cloud-native CPU design laid the foundation for Cavium's first generation of ARM server CPUs, ThunderX. He is the founding president of the Swiss Datacenter Efficiency Association with an online platform and a label that helps operators quantify their electricity and water efficiency. He is a recipient of an Alfred P. Sloan Research Fellowship, and a fellow of ACM and IEEE.

9:45
10:00
Session Chair: TBD
10:00 – 10:20
DICE : Detailed Inter-Chiplet End-to-End PHY Modeling for Accurate Chiplet Simulation
Rashid Aligholipour (Uppsala University), Stefanos Kaxiras (Uppsala University), Yuan Yao (Uppsala University)

10:20 – 10:40
Omelet: A Packaging-Aware Hierarchical Intercon- nect Simulator for 2.5D/3D Chiplet Architectures
Jiho Kim (Georgia Institute of Technology), Danish Baig (Georgia Institute of Technology), Faaiq Waqar (Georgia Institute of Technology), Ashita Victor (Georgia Institute of Technology), Shimeng Yu (Georgia Tech), Muhannad Bakir (Georgia Institute of Technology), Cong (Callie) Hao (Georgia Institute of Technology)

10:40 – 11:00
PhaseWeave: Phase-Aware Execution on Heterogeneous Chiplet Architectures for Datacenters
Joshua Kim (University of Texas at Austin), Chaojie Zhang (Microsoft), Íñigo Goiri (Microsoft), Christopher J. Rossbach (University of Texas at Austin and Microsoft), Jovan Stojkovic (University of Texas at Austin)

11:00 – 11:20
ConBin: A Performance-Convergence Framework for Wafer-Scale Chip Binning
Huiqing Xu (Research Center for Intelligent Computing Systems, SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences), Mengdi Wang (Institute of Computing Technology, Chinese Academy of Science), Yinhe Han (Institute of Computing Technology, Chinese Academy of Science), Ying Wang (Institute of Computing Technology, Chinese Academy of Sciences)

11:20 – 11:40
WaferBRAIN: Whole-Brain Scale Neuromorphic Architecture Based on Wafer-Scale Integration
Yukun Feng (University of Macau), Hao Jia (Guangdong Institute of Intelligence Science and Technology), Liangyu Gan (University of Macau), Haoming Chu (Guangdong Institute of Intelligence Science and Technology), Yufan He (Guangdong Institute of Intelligence Science and Technology), Jiaxin Yin (Guangdong Institute of Intelligence Science and Technology), Lirong Zheng (Guangdong Institute of Intelligence Science and Technology), Ning Ma (Guangdong Institute of Intelligence Science and Technology), Yuxiang Huan (Guangdong Institute of Intelligence Science and Technology)

11:40 – 12:00
DS-ISA: Instruction Set Architecture for Dynamical System Units
Chunshu Wu (Pacific Northwest National Laboratory), Ruibing Song (Rice University), Chuan Liu (Rice University), Tong Geng (Rice University), Ang Li (PNNL and UW)
Session Chair: TBD
10:00 – 10:20
NeRArch-Sim: A Unified Simulator for Benchmarking and DSE of Neural Rendering Accelerators
Cheng-Jhih Shih (Georgia Institute of Technology), Chaojian Li (Georgia Institute of Technology), Chihao Yu (Georgia Institute of Technology), Hsuan-Chen Fang (Georgia Institute of Technology), Sixu Li (Georgia Institute of Technology), Wei-Po Hsin (Georgia Institute of Technology), Lexington Allen Whalen (Georgia Institute of Technology), Hyewon Suh (Georgia Institute of Technology), Greg Eisenhauer (Georgia Institute of Technology), Ling Liu (College of Computing, Georgia Institute of Technology), Yingyan (Celine) Lin (Georgia Tech)

10:20 – 10:40
BULLET TIME: Time Dilation for High-Fidelity Tracing
Michael Wu (Yale University), Sibren Isaacman (Loyola University Maryland), Abhishek Bhattacharjee (Yale University), Anurag Khandelwal (Yale University)

10:40 – 11:00
PIPEWEAVE : Synergizing Analytical and Learning Models for Unified GPU Performance Prediction
Kaixuan Zhang (Shanghai Jiao Tong University), Yunfan Cui (Shanghai Jiao Tong University), Shuhao Zhang (Shanghai Jiao Tong University), Chutong Ding (Shanghai Jiao Tong University), Shiyou Qian (Shanghai Jiao Tong University), Luping Wang (Alibaba Group), Jian Cao (Shanghai Jiaotong University), Guangtao Xue (Shanghai Jiao Tong University), Cheng Huang (Alibaba Group), Guodong Yang (Alibaba Group), Liping Zhang (Alibaba Group)

11:00 – 11:20
SSBench: Automated Characterization of Memory Dependence Predictors on Modern CPUs
Chang Liu (Tsinghua University), Yu Jin (Tsinghua University), Yuchen Fan (Tsinghua University), Tianrui Xiao (Tsinghua University), Lingfeng Yin (Zhongguancun Laboratory), Trevor E. Carlson (National University of Singapore), Shuwen Deng (Tsinghua University, Zhongguancun Laboratory), Dongsheng Wang (Tsinghua University, Zhongguancun Laboratory)

11:20 – 11:40
R2D2:Robotized Reconfigurable Network for Disaggregated Datacenters
Linus Y. Wong (University of Pennsylvania), Zhiyao Tang (University of Pennsylvania), Justin Ruihang Yu (University of Pennsylvania), Zhilei Zheng (University of Pennsylvania), Jonathan Smith (University of Pennsylvania), Andre DeHon (University of Pennsylvania), Jing "Jane" Li (University of Pennsylvania)

11:40 – 12:00
Lotus: A Task Dataflow Architecture for Cycle-Level Simulation
Fares Elsabbagh (MIT), Joel Emer (MIT), Daniel Sanchez (MIT)
Session Chair: TBD
10:00 – 10:20
SegFold: Accelerating Sparse GEMM with a Fine-Grained Dynamic Dataflow
Xinrui Wu (UCLA), Hanyu Wang (UCLA), Jason Cong (UCLA), Tony Nowatzki (UCLA)

10:20 – 10:40
ParetoES: Hardware-Accelerated Sparse Embedding Similarity via Pareto-Optimal Pruning
Jiaqi Zhai (Huazhong University of Science and Technology), Xuanhua Shi (Huazhong University of Science and Technology), Wenju Zhao (Huazhong University of Science and Technology), Kaiyi Huang (Huazhong University of Science and Technology), Chencheng Ye (Huazhong University of Science and Technology), Shunsen Lv (Huazhong University of Science and Technology), Zhongtian Long (Huazhong University of Science and Technology), Bingsheng He (National University of Singapore), Hai Jin (Huazhong University of Science and Technology)

10:40 – 11:00
ECHO: Efficient Head-Orientation-Guided Real-Time Sound Spatialization for Virtual Reality
Haiyu Wang (New York University), Tianhua Xia (New York University), Sai Qian Zhang (New York University)

11:00 – 11:20
DESSCam: An Event-Driven Architecture with In-Sensor Epitopological Sparse Sampling to Break the Latency-Power Tradeoff in Eye Tracking
Zhijie Jian (Tsinghua University, Beijing, China), Shangyu Yang (Tsinghua University, Beijing, China), Yuan Hua (Tsinghua University, Beijing, China), Jilin Zhang (Tsinghua University, Beijing, China), Ziyi Cheng (Tsinghua University, Beijing, China), Hong Chen (Tsinghua University, Beijing, China)

11:20 – 11:40
SLICE: A Selective Local Inference Framework with Codec Exploitation for Accelerating Video Super-Resolution
Mingu Jung (Yonsei University), Sungbin Kim (Yonsei University), Seunghyun Lee (Yonsei University), Seung Hyun Jin (Yonsei University), Hyunwuk Lee (Samsung Electronics), Won Woo Ro (Yonsei University)

11:40 – 12:00
Enabling Continuous, In-Field Introspection: The Programmable IPU Architecture
Ian McDougall (University of Wisconsin-Madison), Shayne Wadle (UW-Madison), Harish Babu Batchu (UW-Madison and NVIDIA), Karthikeyan Sankaralingam (U. of Wisconsin and NVIDIA)
Session Chair: TBD
10:00 – 10:20
FEnc2: Unifying Data Packing for Efficient Private Inference via Convolution and Architecture-Aware Fragment Encoding
Ran Ran (North Carolina State University), Zhaoting Gong (North Carolina State University), Nuo Xu (University of Minnesota), Yuanchao Xu (University of California, Santa Cruz), Fan Yao (University of Central Florida), Wujie Wen (North Carolina State University)

10:20 – 10:40
FlashTFHE: A Scalable Architecture for Efficient Multi-bit Fully Homomorphic Encryption
Jiaao Ma (Duke University), Ceyu Xu (Duke University), Ning Liang (Duke University), Lisa Wu Wills (Duke University)

10:40 – 11:00
Unlocking Pipeline Parallelism for Bootstrapping: A Pipelined Multi-Chiplet TFHE Accelerator
Yibo Du (1.University of Chinese Academy of Sciences. 2.Institute of Computing Technology, Chinese Academy of Sciences), Mengdi Wang (Institute of Computing Technology, Chinese Academy of Sciences), Cangyuan Li (Institute of Computing Technology, Chinese Academy of Sciences), Yinhe Han (Institute of Computing Technology, Chinese Academy of Sciences), Ying Wang (Institute of Computing Technology, Chinese Academy of Sciences)

11:00 – 11:20
HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption
Shangyi Shi (SKLP, Institute of Computing Technology, Chinese Academy of Sciences), Husheng Han (Institute of Computing Technology, Chinese Academy of Sciences), Zhaoxuan Kan (Institute of Computing Technology, Chinese Academy of Sciences), Yinghao Yang (Institute of Computing Technology, Chinese Academy of Sciences), Jianan Mu (ICT,CAS), Tenghui Hua (ICT,CAS), Ge Yu (School of Advanced Interdisciplinary Sciences, University of Chinese Academy of Sciences), Xinyao Zheng (Institute of Computing Technology, Chinese Academy of Sciences), Ling Liang (DAMO Academy Alibaba Group, Peking University), Zidong Du (Institute of Computing Technology, Chinese Academy of Sciences), Xing Hu (Institute of Computing Technology, Chinese Academy of Sciences)

11:20 – 11:40
HyperDrive: Hierarchical Exploitation of Memory Efficiency for GPU-Based FHE Acceleration
Guang Fan (Ant Group), Yi Chen (Peking University), Lei Chen (Ant Group), Liang Kong (Ant Group), Chao Niu (Ant Group), Dian Jiao (Institute of Information Engineering, Chinese Academy of Sciences), Yilan Zhu (Ant Group), Geng Yang (Ant Group), Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences), Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences), Fangyu Zheng (University of Chinese Academy of Sciences), Jian Weng (KAUST), Meng Li (Peking University), Yisong Chang (Ant Group), Shoumeng Yan (Ant Group), Mingzhe Zhang (Ant Group)

11:40 – 12:00
MNEMOS: A GPU-based TFHE Acceleration Framework with Memory Access Optimization
Junyi Zhang (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Xianglong Deng (Institute of Information Engineering, Chinese Academy of Sciences), Yi Chen (Peking University), Guang Fan (Ant Group), Lei Chen (Ant Group), Dian Jiao (Institute of Information Engineering, Chinese Academy of Sciences), Shengyu Fan (Institute of Information Engineering, Chinese Academy of Sciences), Zhiwei Wang (State Key Laboratory of Cyberspace Security Defense, Institute of Information Engineering, CAS), Mingzhe Zhang (Ant Group)
12:00
12:15
Session Chair: TBD
12:15 – 12:35
GauTracer: Extending Ray Tracing Accelerator for Gaussian-based Scene Representation
Lizhou Wu (State Key Laboratory of Integrated Chips & Systems, Fudan University), Kunchen Zou (State Key Laboratory of Integrated Chips & Systems, Fudan University), Yuzheng Lin (State Key Laboratory of Integrated Chips & Systems, Fudan University), Chixiao Chen (State Key Laboratory of Integrated Chips & Systems, Fudan University), Xiaoyang Zeng (State Key Laboratory of Integrated Chips & Systems, Fudan University), Haozhe Zhu (State Key Laboratory of Integrated Chips & Systems, Fudan University)

12:35 – 12:55
TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing
Yavuz Tozlu (North Carolina State University), Anshul Naithani (North Carolina State University), Huiyang Zhou (North Carolina State University)

12:55 – 13:15
Optimizing 3D Gaussian Splatting with Axis-Shared Rasterization and Order-independent Transmittance
Zhican Wang (Shanghai Jiao Tong University), Guanghui He (Shanghai Jiao Tong University), Lingjun Gao (Imperial College London), Dantong Liu (University of Cambridge), Shell Xu Hu (Samsung AI), Chen Zhang (Shanghai Jiao Tong University), Zhuoran Song (Shanghai Jiao Tong University), Nicholas Lane (University of Cambridge and Samsung AI), Hongxiang Fan (Imperial College London)
Session Chair: TBD
12:15 – 12:35
Bumper: Hinting Instruction Usefulness for Robust Unified Caches
Georgios Vavouliotis (Huawei), Tom Rollet (Huawei), Davide Basilio Bartolini (Huawei), Boris Grot (University of Edinburgh), Leeor Peled (Huawei), Yang Lixia (Huawei)

12:35 – 12:55
ICP: Exploiting Instruction Correlation for Prefetching Irregular Memory Accesses
Mengming Li (Hong Kong University of Science and Technology), Chenlu Miao (NVIDIA), Buqing Xu (Hong Kong University of Science and Technology), Qijun Zhang (Hong Kong University of Science and Technology), Xiangfeng Sun (The Hong Kong University of Science and Technology), Ceyu Xu (The Hong Kong University of Science and Technology), Yuan Xie (HKUST), Wenkai Li (Hong Kong University of Science and Technology), Shang Liu (Hong Kong University of Science and Technology), Zhiyao Xie (Hong Kong University of Science and Technology)

12:55 – 13:15
Revelator: Rapid Data Fetching via OS-Guided Hash-based Speculative Address Translation
Konstantinos Kanellopoulos (ETH Zürich), Konstantinos Sgouras (ETH Zurich), Harsh Songara (ETH Zurich), Andreas Kosmas Kakolyris (ETH Zurich), Vlad-Petru Nitu (ETH Zürich), Spiros Galanopoulos (National Technical Univ. of Athens, ETH Zurich), Rahul Bera (ETH Zürich), Konstantina Koliogeorgi (ETH Zurich), Rakesh Kumar (NTNU), Onur Mutlu (ETH Zurich)
Session Chair: TBD
12:15 – 12:35
Distilling Magic States in the Bicycle Architecture
Shifan Xu (Yale University), Kun Liu (Yale University), Patrick Rall (IBM Quantum), Zhiyang He (MIT), Yongshan Ding (Yale University)

12:35 – 12:55
O3LS: Optimizing Lattice Surgery via Automatic Layout Searching and Loose Scheduling
Chenghong Zhu (Guangzhou), Xian Wu (Guangzhou), Jiahan Chen (Guangzhou), Keming He (The Hong Kong University of Science and Technology (Guangzhou), Junjie Wu (National University of Defense Technology), Xin Wang (The Hong Kong University of Science and Technology (Guangzhou), Lingling Lao (National University of Defense Technology)

12:55 – 13:15
Leveraging Phase Polynomials for Quantum Circuit Optimization
Zihan Chen (Rutgers University), Henry Chen (Rutgers University), Yuwei Jin (Rutgers University), Enhyeok Jang (Yonsei University), Mingkuan Xu (Carnegie Mellon University), Vannessa Chan (Rutgers University), Won Woo Ro (Yonsei University), Eddy Z. Zhang (Rutgers University)
Session Chair: TBD
12:15 – 12:35
Harmonia: A Unified Hierarchical Scheduling Framework for Sparse Matrix Multiplication
Jingkui Yang (National University of Defense Technology), Fangxin Liu (Shanghai Jiao Tong University), Xin Ju (National University of Defense Technology), Ning Yang (Shanghai Jiao Tong University), Chenyang Guan (Shanghai Jiao Tong University), Junjie Wang (Shanghai Jiao Tong University), Zongwu Wang (Shanghai Jiao Tong University), Mei Wen (National University of Defense Technology), Jian Liu (Beijing University of Aeronautics and Astronautics), Li Jiang (Shanghai Jiaotong University), Haibing Guan (Shanghai Jiao Tong University)

12:35 – 12:55
PipeComm: Maximizing Link Utilization through Pipeline-Aware Collective Communication Synthesis
Ruifan Xu (Peking University), Yuze Luo (Peking University), Yuhao Meng (Peking University), Size Zheng (Peking University), Meng Li (Peking University), Yun Liang (Peking University)

12:55 – 13:15
Breaking Barriers in Atomic Scaling: A Hardware–Software-Collaborated Framework to Deconstruct RDMA Atomic
Guangyang Deng (Xiamen University), Qiangsheng Su (Xiamen University), Zhirong Shen (Xiamen University), Qing Wang (Nanjing University), Yina Lv (Xiamen University), Ronglong Wu (Xiamen University), Jiwu Shu (Xiamen University)
13:15 EDT
14:30
Session Chair: TBD
14:30 – 14:50
Democratizing and Accelerating Hardware Verification with Software-Native Optimization
Yunlong Xie (Institute of Computing Technology, Chinese Academy of Sciences), Zhicheng Yao (ICT, CAS & University of Chinese Academy of Sciences), Fangyuan Song (Institute of Computing Technology, Chinese Academy of Sciences), Jincheng Liu (Institute of Computing Technology, Chinese Academy of Sciences), Junyue Wang (Institute of Computing Technology, Chinese Academy of Sciences), Haojin Tang (Institute of Computing Technology, Chinese Academy of Sciences), Lu Chen (Institute of Computing Technology, Chinese Academy of Sciences), Yinan Xu (Institute of Computing Technology, Chinese Academy of Sciences), Ziqing Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Ziyuan Gao (Institute of Computing Technology, Chinese Academy of Sciences), Duan Yu (Imperial College London), Hongtao Zhou (Northwestern Polytechnical University), Jiayi Rao (Institute of Computing Technology, Chinese Academy of Sciences), Junyu Yue (Institute of Computing Technology, Chinese Academy of Sciences), Xiaolong Li (Institute of Computing Technology, Chinese Academy of Sciences), Yunqi Lu (Institute of Computing Technology, Chinese Academy of Sciences), Zechen Yang (Institute of Computing Technology, Chinese Academy of Sciences), Hang Zhu (Institute of Computing Technology, Chinese Academy of Sciences), Shan Liu (Beijing Institute of Open Source Chip), Xu An (Beijing Institute of Open Source Chip), Qi Ge (Beijing Institute of Open Source Chip), Jiuyue Ma (Beijing Institute of Open Source Chip), Jianyi Meng (Beijing Institute of Open Source Chip), Kan Shi (Institute of Computing, Chinese Academy of Sciences), Dan Tang (Institute of Computing Technology, Chinese Academy of Sciences), Tianyi Liu (Institute of Computing Technology, Chinese Academy of Sciences), Sa Wang (Institute of Computing Technology, Chinese Academy of Science), Yungang Bao (ICT, CAS)

14:50 – 15:10
HartBreaker: Deterministic Fuzzing of Multi-Hart RISC-V CPUs with Non-Deterministic Programs
Quentin Bordier (ETH Zurich), Tobias Kovats (ETH Zurich), Flavien Solt (UC Berkeley), Kaveh Razavi (ETH Zurich)

15:10 – 15:30
QED: Scalable Consistency Verification of Memory Instruction Reordering in Hardware
Gokulan Ravi (Purdue University), Xiaokang Qiu (Purdue University), Mithuna Thottethodi (Purdue University), T. N. Vijaykumar (Purdue University)

15:30 – 15:50
täkōFormal: Enabling Robust Software for Programmable Memory Hierarchies
Pranav Srinivasan (University of Michigan), Manos Kapritsos (University of Michigan), Yatin A. Manerkar (University of Michigan)
Session Chair: TBD
14:30 – 14:50
LIBRA: A High-Accuracy, Cost-Aware, and Coordinated Multi-GPU Page Prefetcher
Xiangyue Huang (UC-Santa Cruz), Yanan Guo (University of Rochester), Yuanchao Xu (University of California, Santa Cruz)

14:50 – 15:10
Enhancing Instruction Prefetching via Cache and TLB Management
Alexandre Valentin Jamet (Barcelona Supercomputing Center), Georgios Vavouliotis (Huawei Research), Marti Torrents (Barcelona Supercomputing Center), Dimitrios Chasapis (Barcelona Supercomputing Center), Marc Casas (Barcelona Supercomputing Center)

15:10 – 15:30
R-Max: Extending B'el'ady's MIN with Prefetching to Bound Realistic Cache Performance
Lei Wang (Texas A&M University), Chia-Hang Lee (Texas A&M University), Maccoy Merrell (Texas A&M University), Gino Chacon (AheadComputing Inc.), Daniel A. Jiménez (Texas A&M University), Paul V. Gratz (Texas A&M University)

15:30 – 15:50
From Memorization to Generalization: A Practical Neural Network Prefetching Framework
Xuan Tang (National University of Defense Technology), Zicong Wang (National University of Defense Technology), Shuiyi He (National University of Defense Technology), Hao Tang (National University of Defense Technology), Dezun Dong (National University of Defense Technology), Xiangke Liao (National University of Defense Technology)
Session Chair: TBD
14:30 – 14:50
Unifying Qubit Routing Across Diverse Quantum ISAs via Canonical Representation
Zhaohui Yang (The Hong Kong University of Science and Technology), Kai Zhang (Tsinghua University), Xinyang Tian (Tsinghua University), Xiangyu Ren (University of Edinburgh), Yingjian Liu (Leiden University), Yunfeng Li (The University of Hong Kong), Dawei Ding (Tsinghua University), Jianxin Chen (Tsinghua University), Yuan Xie (The Hong Kong University of Science and Technology)

14:50 – 15:10
TUSQ: Tracking, Uncomputation, and Sampling for Noisy Quantum Simulation
Siddharth Dangwal (University of Chicago), Tina Oberoi (University of Chicago), Ajay Sailopal (University of Chicago), Dhirpal Shah (University of Chicago), Fred Chong (University of Chicago)

15:10 – 15:30
Photonic Quantum Computing on Spin Memory Architecture with Tree-Encoded Fusion
Xiangyu Ren (University of Edinburgh), Yuexun Huang (the University of Chicago), Zhemin Zhang (The Chinese University of Hong Kong), Yuchen Zhu (Northwestern University), Tsung-Yi Ho (The Chinese University of Hong Kong), Antonio Barbalace (The University of Edinburgh), Zhiding Liang (The Chinese University of Hong Kong)

15:30 – 15:50
SATIC: An Optimizing Ising Compiler for SAT(isfiability)
Ahmet Efe (University of Minnesota), Husrev Cilasun (University of Minnesota), Abhimanyu Kumar (University of Minnesota), Nafisa Prova (University of Minnesota), Ziqing Zeng (University of Minnesota), Tahmida Islam (University of Minnesota), Ruihong Yin (University of Minnesota), Chaohui Li (University of Minnesota), Peter Kreye (University of Minnesota), Chris Kim (University of Minnesota), Sachin S. Sapatnekar (University of Minnesota), Ulya R. Karpuzcu (University of Minnesota)
Session Chair: TBD
14:30 – 14:50
PowerGrad: Hierarchical Power Management for Power-Limited ML Inference Clusters
Hyoungwook Nam (University of Illinois at Urbana-Champaign), Raghavendra Pradyumna Pothukuchi (University of North Carolina at Chapel Hill), Alper Buyuktosunoglu (IBM Research), Aporva Amarnath (AMD), Pradip Bose (IBM), Josep Torrellas (University of Illinois at Urbana-Champaign)

14:50 – 15:10
Power Sloshing in Compound Servers for Large-Scale AI Inference Workloads
Albert Y Cho (Georgia Tech), Jovan Stojkovic (Meta), Leonardo Piga (Meta), Abhishek Dhanotia (Meta), Sultan Mahmud Sajal (Meta), Gefei Zuo (Meta), Krishna Malladi (Meta), Devon Akers (Meta), Kalyan Subramanian (Meta), Shobhit Kanaujia (Meta), Alexandros Daglis (Georgia Tech)

15:10 – 15:30
PowerWeave: Unlocking Energy-Efficient ML on GPUs with OS-Level Spatial Power Management
Vasileios Kypriotis (Carnegie Mellon University), Eric Dubberstein (Carnegie Mellon University), Patrick H. Coppock (Carnegie Mellon University), Eliot H. Solomon (Carnegie Mellon University), Rayyan Zamir (Carnegie Mellon University), Tathagata Srimani (Carnegie Mellon University), Dimitrios Skarlatos (Carnegie Mellon University)

15:30 – 15:50
Lit Silicon : A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs
Marco Kurzynski (University of Central Florida), Shaizeen Aga (AMD), Di Wu (University of Central Florida)
17:50
16:20
Session Chair: TBD
16:20 – 16:40
Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs
Changhai Man (Georgia Institute of Technology), Joongun Park (Georgia Institute of Technology), Hanjiang Wu (Georgia Institute of Technology), Huan Xu (Georgia Institute of Technology), Srinivas Sridharan (Nvidia Inc.), Tushar Krishna (Georgia Institute of Technology)

16:40 – 17:00
DisDP: Disaggregating Compute, Network, and Storage for Model-Sharded Data-Parallel Training
Mo Sun (Zhejiang University), Zihan Yang (Zhejiang University), Changyue Liao (Zhejiang University), Yingtao Li (Zhejiang University), Jie Zhang (Zhejiang University), Kaiqi Chen (Zhejiang University), Fei Wu (Zhejiang University), Zeke Wang (Zhejiang University)

17:00 – 17:20
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
Zhuoshan Zhou (Shanghai Jiao Tong University), Chen Zhang (Shanghai Jiao Tong University), Shuyi Zhang (Shanghai Jiao Tong University), Qijun Zhang (Shanghai Jiao Tong University), Haibo Wang (Huawei), Zhe Zhou (Huawei), Zhipeng Tu (Huawei), Guangyu Sun (Peking University), Yijia Diao (Shanghai Jiao Tong University), Zhigang Ji (Shanghai Jiao Tong University), Jingwen Leng (Shanghai Jiao Tong University), Guanghui He (Shanghai Jiao Tong University), Minyi Guo (Shanghai Jiao Tong University)

17:20 – 17:40
Symbiotic MLLM Serving: Dynamically Balancing Parallelism Across GPUs and Resources Within GPUs
Zhicheng Li (Institute of Computing Technology, Chinese Academy of Sciences), Jiacheng Zhao (Institute of Computing Technology, Chinese Academy of Sciences), Yangyu Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Zhaolin Duan (Tianjin University), Xinyu Liu (Institute of Computing Technology, Chinese Academy of Sciences), Siqi Li (Beijing University of Technology), Shuoming Zhang (Institute of Computing Technology, Chinese Academy of Sciences), Shuaijiang Li (Institute of Computing Technology), Donglin Yu (University of Illinois Urbana-Champaign), Yuan Wen (University of Aberdeen), Chunwei Xia (University of Leeds), Xiyu Shi (Institute of Computing Technology at Chinese Academy of Sciences), Huimin Cui (Institute of Computing Technology, CAS)

17:40 – 18:00
Session Chair: TBD
16:20 – 16:40
Dynamic Scheduling for AI Accelerators via TISA
Guanghui Song (Hunan University), Xiaoqiang Dan (EVAS Intelligence), Chengke Wang (EVAS Intelligence), Fei Liu (EVAS Intelligence), Wenyuan Lv (EVAS Intelligence), Zhongzhou Jiang (EVAS Intelligence), Jianjian Guan (EVAS Intelligence), Teng Lu (EVAS Intelligence), Lin Tao (EVAS Intelligence), Cheng Li (EVAS Intelligence), Weixing Pan (EVAS Intelligence), Wei Huang (EVAS Intelligence), Zirong Shen (EVAS Intelligence), Yi Yang (EVAS Intelligence), Hui Liu (EVAS Intelligence), Jie Zhao (Hunan University)

16:40 – 17:00
MXFFP: Microscaling Flexible Floating Point Format for Large-Scale AI Model Acceleration
Sungwoo Kim (Yonsei University), Sungbin Kim (Yonsei University), Dongho Ha (Meta), Hyunwuk Lee (Yonsei University), Junsung Kim (Yonsei University), Seunghyun Lee (Yonsei University), Mingu Jung (Yonsei University), Murali Annavaram (USC), Won Woo Ro (Yonsei University)

17:00 – 17:20
UniCore: A Bit-Width Scalable GEMM Unit for Unified LLM Inference
Yonghao Chen (The Hong Kong University of Science and Technology (Guangzhou)), Jiaxiang Zou (The Hong Kong University of Science and Technology (Guangzhou)), Xingyu Chen (The Hong Kong University of Science and Technology (Guangzhou)), Chenxi Xu (The Hong Kong University of Science and Technology (Guangzhou)), Jingyu Guo (The Hong Kong University of Science and Technology (Guangzhou)), Xinyu Chen (The Hong Kong University of Science and Technology (Guangzhou))

17:20 – 17:40
XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
Feng Yu (National University of Singapore), Hongshi Tan (National University of Singapore), Yao Chen (National University of Singapore), Weng-Fai Wong (National University of Singapore), Bingsheng He (National University of Singapore)

17:40 – 18:00
ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing
Kang You (Shanghai Jiao Tong University), Chen Nie (Shanghai Jiao Tong University), Lee Jun Yan (Shanghai Jiao Tong University), Ziling Wei (Shanghai Jiao Tong University), Cheng Zou (Shanghai Jiao Tong University), Zekai Xu (Shanghai Jiao Tong University), Yu Feng (Shanghai Jiao Tong University), Honglan Jiang (Shanghai Jiao Tong University), Zhezhi He (Shanghai Jiao Tong University)
Session Chair: TBD
16:20 – 16:40
AXLE: Coordinated Offloading with Asynchronous Back-Streaming in Computational Memory Systems
Suyeon Lee (Georgia Tech), Kangkyu Park (SK Hynix), Kwangsik Shin (SK Hynix), Ada Gavrilovska (Georgia Institute of Technology)

16:40 – 17:00
DCC:Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures
Peiming Yang (University of Toronto), Sankeerth Durvasula (University of Toronto), Ivan Fernandez (Barcelona Supercomputing Center (Spain)), Mohammad Sadrosadati (ETH Zürich), Onur Mutlu (ETH Zurich), Gennady Pekhimenko (NVIDIA/University of Toronto), Christina Giannoula (Max Planck Institute for Software Systems)

17:00 – 17:20
Optimizing Spatial Data Structure with Near-Cache Acceleration by Exploiting Physical Locality
Hongyi Li (Tsinghua University), Yijia Liu (Tsinghua University), Haoran Pei (Dalian University of Technology), Qingyuan Yang (Tsinghua University), Zijian Pan (Tsinghua University), Songchen Ma (The Hong Kong University of Science and Technology), Leshan Li (Tsinghua University), Rong Zhao (Tsinghua University), Xinglong Ji (Tsinghua University)

17:20 – 17:40
Bridging Efficiency and Scalability in LLM System via 3D Hybrid PIM with 2D In-Transit Computation
Hongyi Li (Tsinghua University), Songchen Ma (The Hong Kong University of Science and Technology), Huanyu Qu (University of Macau, Guangdong Institute of Intelligence Science and Technology), Weihao Zhang (Lynxi Technologies Co. Ltd), Jia Chen (The Hong Kong University of Science and Technology), Junfeng Lin (Tsinghua University), Fengbin Tu (The Hong Kong University of Science and Technology), Rong Zhao (Tsinghua University)

17:40 – 18:00
Early Silicon of Raptor: The First 3D-DRAM Accelerator for Generative Inference
Prashant J. Nair (d-Matrix and the University of British Columbia), Ramyad Hadidi (d-Matrix), Subramani Ganesh (d-Matrix),Sangamesh Kodge (d-Matrix), Shubhankit Rathore (d-Matrix), Neil Thanawala (d-Matrix), Nikitha Reddy (d-Matrix), Gyanesh Saharia (d-Matrix), Vinayak Patankar (d-Matrix), Arun Tiruvur (d-Matrix), Nithesh Kurella (d-Matrix), Sudeep Bhoja (d-Matrix)
Session Chair: TBD
16:20 – 16:40
DICE: Enabling Efficient General-Purpose SIMT Execution with Statically Scheduled Coarse-Grained Reconfigurable Arrays
Jiayi Wang (University of Washington), Ang Da Lu (University of Washington), Zhichen Zeng (University of Washington), Ang Li (University of Washington)

16:40 – 17:00
TAGT: An Efficient Graph Transformer Accelerator with Topology-aware Sparsification and Merging
Hui Yu (The Hong Kong University of Science and Technology), Wei Zhang (Hong Kong University of Science and Technology), Ligang He (University of Warwick), Jin Zhao (Huazhong University of Science and Technology), Yu Zhang (Service Computing Technology and System Lab, Huazhong University of Science and Technology), Zixiao Wang (National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China)

17:00 – 17:20
GenZA: A General and Efficient Accelerator for Diverse Zero-Knowledge Proof Protocols
Cheng Wang (Xi'an Jiaotong University), Jiangbin Dong (Xi'an Jiaotong University), Mingyu Gao (Tsinghua University)

17:20 – 17:40
HiT: A Unified Sparsity-Adaptive Architecture for High-Throughput Matrix Multiplication
Tingting Xiang (National University of Singapore), Xiaochen Wang (Zhejiang University), Miao Yu (National University of Singapore), Trevor E. Carlson (National University of Singapore (NUS))

17:40 – 18:00
DiTPA: A DiT-based Action Planner Accelerator Exploiting Action–Denoising–Multimodality Redundancy for Embodied Artificial Intelligence
Xin Zhao (The Hong Kong University of Science and Technology), Longke Yan (The Hong Kong University of Science and Technology), Jiancong Li (The Hong Kong University of Science and Technology), Yongkun Wu (The Hong Kong University of Science and Technology), Fengbin Tu (The Hong Kong University of Science and Technology)
18:00 EDT
TBD
Evening