|
SELECT CONFERENCES and PUBLICATIONS in 1H 2025
POPL 2025, The 52nd ACM Symposium on Principles of Programming Languages, January 19-25, 2025, Denver, CO
A Research Career in Balance
Andrew Myers Cornell University
Universal Composability is Robust Compilation
Marco Patrignani University of Trento, Robert Künnemann CISPA Helmholtz Center for Information Security, Riad S. Wahby Stanford University, USA, Ethan Cecchetti University of Wisconsin-Madison
A Demonic Outcome Logic for Randomized Nondeterminism
Noam Zilberstein Cornell University, Dexter Kozen Cornell University, Alexandra Silva Cornell University, Joseph Tassarotti New York University
Flo: a Semantic Foundation for Progressive Stream Processing
Shadaj Laddad University of California at Berkeley, Alvin Cheung University of California at Berkeley, Joseph M. Hellerstein UC Berkeley, Mae Milano Princeton Universit
FAST '25 - The 23rd USENIX Conference on File and Storage Technologies - February 25-27, 2024, Santa Clara, CA, USA
Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot
Ruoyu Qin, Moonshot AI and Tsinghua University; Zheming Li, Weiran He, and Jialei Cui, Moonshot AI; Feng Ren, Mingxing Zhang, Yongwei Wu, and Weimin Zheng, Tsinghua University; Xinran Xu, Moonshot AI
Awarded Best Paper!
Sambhav Satija, Chenhao Ye, Ranjitha Kosgi, Aditya Jain, Romit Kankaria, Yiwei Chen, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin–Madison; Kiran Srinivasan, NetApp
HPCA 2025 - The 31st IEEE International Symposium on High-Performance Computer Architecture - March 1-5, 2025, Las Vegas, NV
Reuse-Aware Compilation for Zoned Quantum Architectures Based on Neutral Atoms
Wan-Hsuan Lin (UCLA), Daniel Bochen Tan (UCLA), Jason Cong (UCLA)
The Importance of Generalizability in Machine Learning for Systems
Varun Gohil (Massachusetts Institute of Technology), Sundar Dev (Google), Gaurang Upasani (Google), David Lo (Google), Parthasarathy Ranganathan (Google), Christina Delimitrou (Massachusetts Institute of Technology)
SELECTED BEST OF COMPUTER ARCHITECTURE LETTERS FOR 2024
SPARK – Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems
Siddhartha Raman Sundara Raman (The University of Texas at Austin), Lizy Kurian John (UT Austin), Jaydeep Kulkarni (University of Texas, Austin)
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI
Arya Tschand (Harvard University), Arun Tejusve Raghunath Rajan (Self / Meta), Sachin Idgunji (NVIDIA), Anirban Ghosh (NVIDIA), Jeremy Holleman (UNC Charlotte / Syntiant), Csaba Kiraly (Codex), Pawan Ambalkar (Dell), Ritika Borkar (NVIDIA), Ramesh Chukka (Intel), Trevor Cockrell (Dell), Oliver Curtis (SMC), Grigori Fursin (FlexAI / cTuning), Miro Hodak (AMD), Hiwot Kassa (Meta), Anton Lokhmotov (KRAI), Dejan Miskovic (NVIDIA), Yuechao Pan (Google), Manu Prasad Manmathan (Intel), Liz Raymond (Dell), Tom St. John (Decompute), Arjun Suresh (GATE Overflow), Rowan Taubitz (SMC), Sean Zhan (SMC), Scott Wasson (MLCommons), David Kanter (MLCommons), Vijay Janapa Reddi (Harvard University)
Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization
Jianbo Dong (Alibaba Group, Bin Luo (Alibaba Group), Jun Zhang (Alibaba Group), Pengcheng Zhang (Alibaba Group), Fei Feng (Alibaba Group), Yikai Zhu (Alibaba Group), Ang Liu (Alibaba Group), Zian Chen (Alibaba Group), Yi Shi (Alibaba Group), Yang Liu (Alibaba Group), Hairong Jiao (Alibaba Group), Gang Lu (Alibaba Group), Yu Guan (Alibaba Group), Ennan Zhai (Alibaba Group), Wencong Xiao (Alibaba Group), Hanyu Zhao (Alibaba Group), Man Yuan (Alibaba Group), Siran Yang (Alibaba Group), Xiang Li (Alibaba Group), Jiamang Wang (Alibaba Group), Rui Men (Alibaba Group), Jianwei Zhang (Alibaba Group), Chang Zhou (Alibaba Group), Dennis Cai (Alibaba Group), Yuan Xie (Alibaba Group), Binzhang Fu (Alibaba Group)
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Apostolos Kokolis (Meta), Michael Kuchnik (Meta), John Hoffman (Meta), Adithya Kumar (Meta), Parth Malani (Meta), Faye Ma (Meta), Zachary DeVito (Meta), Shubho Sengupta (Meta), Kalyan Saladi (Meta), Carole-Jean Wu (Meta)
CORDOBA: Carbon-Efficient Optimization Framework for Computing Systems
Mariam Elgamal (Harvard University), Doug Carmean (Meta), Elnaz Ansari (Meta), Okay Zed (Meta), Ramesh Peri (Meta), Srilatha Manne (Meta), Udit Gupta (Meta), Gu-Yeon Wei (Harvard University), David Brooks (Harvard University), Gage Hills (Harvard University), Carole-Jean Wu (Meta)
ARTEMIS: Agile Discovery of Efficient Real-Time Systems-on-Chips in the Heterogeneous Era
Subhankar Pal (IBM Research), Aporva Amarnath (IBM Research), Behzad Boroujerdian (University of Texas at Austin / Harvard University), Augusto Vega (IBM Research), Alper Buyuktosunoglu (IBM Research), John-David Wellman (IBM Research), Vijay Janapa Reddi (Harvard University), Pradip Bose (IBM Research)
LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
Yujun Lin (MIT), Zhekai Zhang (MIT), Song Han (MIT)
Ariadne: A Hotness-Aware and Size-Adaptive Compressed Swap Technique for Fast Application Relaunch and Reduced CPU Usage on Mobile Devices
Yu Liang (ETH Zürich), Aofeng Shen (ETH Zürich), Chun Jason Xue (MBZUAI), Riwei Pan (City University of Hong Kong), Haiyu Mao (ETH Zürich), Nika Mansouri Ghiasi (ETH Zürich), Qingcai Jiang (ETH Zürich and University of Science and Technology of China), Rakesh Nadig (ETH Zürich), Lei Li (City University of Hong Kong), Rachata Ausavarungnirun (MangoBoost), Mohammad Sadrosadati (ETH Zürich), Onur Mutlu (ETH Zürich)
ASPLOS 2025 - The 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 30-April 3, Rotterdam, The Netherlands
Keynote (Joint with Eurosys 2025): Has Machine Learning for Systems Reached an Inflection Point?
Martin Maas, Google
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing Packing and In-Flash Processing
Mayank Kabra (ETH Zurich), Rakesh Nadig (ETH Zurich), Harshita Gupta (ETH Zurich), Manos Frouzakis (ETH Zurich), Rahul Bera (ETH Zurich), Vamanan Arulchelvan (ETH Zurich), Yu Liang (ETH Zurich), Haiyu Mao (ETH Zurich), Mohammad Sadrosadati (ETH Zurich), Onur Mutlu (ETH Zurich)
Composing Distributed Computations Through Task and Kernel Fusion
Rohan Yadav (Stanford University), Shiv Sundram (Stanford University), Wonchan Lee (NVIDIA), Michael Garland (NVIDIA), Michael Bauer (NVIDIA), Alex Aiken (Stanford University), Fredrik Kjolstad (Stanford University)
Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud Platforms
Benjamin Reidys (University of Illinois Urbana-Champaign), Pantea Zardoshti (Microsoft), Íñigo Goiri (Microsoft), Celine Irvene (Microsoft), Daniel S. Berger (Microsoft,University of Washington), Haoran Ma (University of California-Los Angeles), Kapil Arya (Microsoft), Eli Cortez (Microsoft), Taylor Stark (Microsoft), Eugene Bak (Microsoft), Mehmet Iyigun (Microsoft), Stanko Novaković (Google), Lisa Hsu (Meta), Karel Trueba (Microsoft), Abhisek Pan (Microsoft), Chetan Bansal (Microsoft), Saravan Rajmohan (Microsoft), Jian Huang (University of Illinois Urbana-Champaign), Ricardo Bianchini (Microsoft)
Copper and Wire: Bridging Expressiveness and Performance for Service Mesh Policies
Divyanshu Saxena (The University of Texas at Austin), William Zhang (The University of Texas at Austin), Shankara Pailoor (The University of Texas at Austin), Isil Dillig (The University of Texas at Austin), Aditya Akella (The University of Texas at Austin)
Necro-reaper: Pruning away Dead Memory Traffic in Warehouse-Scale Computers
Sotiris Apostolakis (Google), Chris Kennelly (Google), Xinliang David Li (Google), Parthasarathy Ranganathan (Google)
ReCA: Integrated Acceleration for Real-Time and Efficient Cooperative Embodied Autonomous Agents
Zishen Wan (Georgia Institute of Technology), Yuhang Du (University of Minnesota, Twin Cities), Mohamed Ibrahim (Georgia Institute of Technology), Jiayi Qian (Georgia Institute of Technology), Jason Jabbour (Harvard University), Yang (Katie) Zhao (University of Minnesota, Twin Cities), Tushar Krishna (Georgia Institute of Technology), Arijit Raychowdhury (Georgia Institute of Technology), Vijay Janapa Reddi (Harvard University)
SuperNoVA: Algorithm-Hardware Co-Design for Resource-Aware SLAM
Seah Kim (University of California, Berkeley), Roger Hsiao (University of California, Berkeley), Borivoje Nikolić (University of California, Berkeley), James Demmel (University of California, Berkeley), Yakun Sophia Shao (University of California, Berkeley)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Shiyi Cao (UC Berkeley), Shu Liu (UC Berkeley), Tyler Griggs (UC Berkeley), Peter Schafhalter (UC Berkeley), Xiaoxuan Liu (UC Berkeley), Ying Sheng (Stanford University), Joseph E. Gonzalez (UC Berkeley), Matei Zaharia (UC Berkeley), Ion Stoica (UC Berkeley)
PCcheck: Persistent Concurrent Checkpointing for ML
Foteini Strati (ETH Zurich), Michal Friedman (ETH Zurich), Ana Klimovic (ETH Zurich)
POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
Aditya K Kamath (Paul G Allen School of Computer Science and Engineering, University of Washington), Ramya Prabhu (Microsoft Research India), Jayashree Mohan (Microsoft Research India), Simon Peter (Paul G Allen School of Computer Science and Engineering, University of Washington), Ramachandran Ramjee (Microsoft Research India), Ashish Panwar (Microsoft Research India)
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Byungsoo Jeon (NVIDIA), Mengdi Wu (Carnegie Mellon Univerisity), Shiyi Cao (UC Berkeley), Sunghyun Kim (Massachusetts Institute of Technology), Sunghyun Park (NVIDIA), Neeraj Aggarwal (Carnegie Mellon University), Colin Unger (Stanford University), Daiyaan Arfeen (Carnegie Mellon University), Peiyuan Liao (Carnegie Mellon University), Xupeng Miao (Carnegie Mellon University), Mohammad Alizadeh (Massachusetts Institute of Technology), Gregory R. Ganger (Carnegie Mellon University), Tianqi Chen (Carnegie Mellon University), Zhihao Jia (Carnegie Mellon University)
EUROSYS 2025 - March 30-April 3, Rotterdam, The Netherlands, March 30- April 3, 2025
DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs
Xiaozhe Yao (ETH Zurich), Qinghao Hu (MIT), Ana Klimovic (ETH Zurich)
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Ziming Mao (UC Berkeley), Tian Xia (UC Berkeley), Zhanghao Wu (UC Berkeley), Wei-Lin Chiang (UC Berkeley), Tyler Griggs (UC Berkeley), Romil Bhardwaj (UC Berkeley), Zongheng Yang (UC Berkeley), Scott Shenker (ICSI AND UC Berkeley), Ion Stoica (UC Berkeley)
Impeller: Stream Processing on Shared LogsZhiting Zhu (Lepton AI), Zhipeng Jia (Google), Newton Ni (University of Texas at Austin), Dixin Tang (UT Austin), Emmett Witchel (UT Austin)
NSDI '25 - The 21st USENIX Symposium on Networked Systems Design and Implementation - April 28-30, 2025, Philadelphia, PA
Jiaxin Lin, UT Austin; Zhiyuan Guo, UCSD; Mihir Shah, NVIDIA; Tao Ji, Microsoft;Yiying Zhang, UCSD; Daehyeok Kim and Aditya Akella, UT Austin
NDD: A Decision Diagram for Network Verification
Zechun Li, Peng Zhang, and Yichi Zhang, Xi'an Jiaotong University; Hongkun Yang, Google
Awarded Outstanding Paper!
Smart Casual Verification of the Confidential Consortium Framework
Heidi Howard, Markus A. Kuppe, Edward Ashton, and Amaury Chamayou, Azure Research, Microsoft; Natacha Crooks, Azure Research, Microsoft and UC Berkeley
Preventing Network Bottlenecks: Accelerating Datacenter Services with Hotspot-Aware Placement for Compute and Storage
Hamid Hajabdolali Bazzaz, Yingjie Bi, and Weiwu Pang, Google; Minlan Yu, Harvard University; Ramesh Govindan, University of Southern California; Neal Cardwell, Nandita Dukkipati, Meng-Jung Tsai, Chris DeForeest, and Yuxue Jin, Google; Charles Carver, Columbia University; Jan Kopanski, Liqun Cheng, and Amin Vahdat, Google
White-Boxing RDMA with Packet-Granular Software Control
Chenxingyu Zhao and Jaehong Min, University of Washington; Ming Liu, University of Wisconsin-Madison; Arvind Krishnamurthy, University of Washington
SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision
Xizheng Wang, Alibaba Cloud and Tsinghua University; Qingxu Li, Yichi Xu, and Gang Lu, Alibaba Cloud; Dan Li, Tsinghua University; Li Chen, Zhongguancun Laboratory; Heyang Zhou, Alibaba Cloud; Linkang Zheng, Alibaba Cloud and South China University of Technology; Sen Zhang, Yikai Zhu, Yang Liu, Pengcheng Zhang, Kun Qian, Kunling He, Jiaqi Gao, and Ennan Zhai, Alibaba Cloud; Dennis Cai, Alibaba Group; Binzhang Fu, Alibaba Cloud
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
Alind Khare and Dhruv Garg, Georgia Institute of Technology; Sukrit Kalra, UC Berkeley; Snigdha Grandhi, Adobe; Ion Stoica, UC Berkeley; Alexey Tumanov, Georgia Institute of Technology
High-level Programming for Application Networks
Xiangfeng Zhu, Yuyao Wang, Banruo Liu, Yongtong Wu, and Nikola Bojanic, University of Washington; Jingrong Chen, Duke University; Gilbert Louis Bernstein and Arvind Krishnamurthy, University of Washington; Sam Kumar, University of Washington and UCLA; Ratul Mahajan, University of Washington; Danyang Zhuo, Duke University
Eden: Developer-Friendly Application-Integrated Far Memory
Anil Yelam, Stewart Grant, and Saarth Deshpande, UC San Diego; Nadav Amit, Technion, Israel Institute of Technology; Radhika Niranjan Mysore, VMware Research Group; Amy Ousterhout, UC San Diego; Marcos K. Aguilera, VMware Research Group; Alex C. Snoeren, UC San Diego
MLSys 2025 - May 12-15, 2025, Santa Clara, CA
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Mingyu Liang, Hiwot Kassa, Wenyin Fu, Brian Coutinho, Louis Feng, Christina Delimitrou
Scaling Deep Learning Training with MPMD Pipeline Parallelism
Anxhelo Xhebraj, Sean Lee, Hanfeng Chen, Vinod Grover
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution
Zhiqiang Xie, Hao Kang, Ying Sheng, Tushar Krishna, Kayvon Fatahalian, Christos Kozyrakis
QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han
Optimizing LLM Queries in Relational Data Analytics Workloads
Shu Liu, Asim Biswal, Audrey Cheng, Amog Kamsetty, Luis Gaspar Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph Gonzalez, Matei Zaharia
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
YOUHE JIANG, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin CUI, Ana Klimovic, Eiko Yoneki
ISCA 52 - The International Symposium on Computer Architecture
June 21-25, 2025, Tokyo, Japan
In-Storage Acceleration of Retrieval Augmented Generation as a Service
Rohan Mahapatra, Harsha Santhanam, Christopher Priebe, Hanyang Xu, Hadi S. Esmaeilzadeh
DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware Codesign
Derrick Quinn, E. Ezgi Yicel, Martin Prammer, Zhenxing Fan, Kevin Skadron, Jignesh Patel, Jose F. Martinez, Mohammad Alian
Transitive Array: An Efficient GEMM Accelerator with Result Reuse
Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai "Helen" Li, Yiran Chen
REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing
Kangqi Chen, Rakesh Nadig, Nika Mansouri Ghiasi, Yu Liang, Haiyu Mao, Jisung Park, Manos Frouzakis, Mohammad Sadrosadati, Onur Mutlu
Reconfigurable Stream Network Architecture
Chengyue Wang, Xiaofan Zhang, Jason Cong, James C. Hoe
|