AI & Cloud Research Updates from IAP

Welcome to the IAP Newsletter with recent and upcoming research publications, news and events. Research includes applications and infrastructure for AI and machine learning, security, hardware acceleration, networking, and storage.

SAVE the DATE: BERKELEY WORKSHOP on the FUTURE of AI in the CLOUD

Date: Tuesday November 4, 2025

Venue: Woz Lounge, Soda Hall, UCB

Expect a full day of talks by leading experts in academia and industry working in AI and machine learning, security, hardware acceleration, networking, and storage.

The Berkeley Workshop is co-organized by Prof. Sagar Karandikar (see the Awards section below) and the IAP in collaboration with EECS faculty.

UW WORKSHOP ON THE FUTURE OF AI and CLOUD COMPUTING

Friday May 9, 2025 @ UW, Seattle, WA

UW Faculty (Stephanie Wang, Tom Anderson, Baris Kasikci, and Simon Peter) Congratulate the Best Poster Winners Vic Li and Matthew Giordano. Joining them are Ulf Hanebutte and Mats Oberg (Marvell), Liguang Xie (Bytedance), Victor Cao (Futurewei), and Brad Beckmann (AMD).

Speakers on May 9 (by order of appearance):

Keynote: Dr. Ricardo Bianchini, Technical Fellow and Corporate Vice President at Microsoft, “Challenges and Opportunities in Datacenter Power and Sustainability in the AI Era”

Prof. Ratul Mahajan, University of Washington, "Application-defined Networking”

Dr. Ulf Hanebutte, Distinguished Engineer, Marvell, "Towards a Flexible Infrastructure Supporting Diverse AI Workloads of Today and Tomorrow”

Prof. Natasha Jaques, University of Washington, "Reinforcement Learning Fine-tuning of Large Language Models"

Keynote: Vinod Grover, Senior Distinguished Engineer, Nvidia, "The Essence of CUDA C++ : Past, Present, and Future"

Prof. Stephanie Wang, University of Washington, "Towards ML System Extensibility"

Prof. Arvind Krishnamurthy, University of Washington, "Optimizing Data Movement for Machine Learning"

Dr. Brad Beckmann, Fellow in Research and Advanced Development, AMD, "Advancing Energy Efficient AI Communication"

Prof. Baris Kasikci, University of Washington, "The Quest For Blazingly Fast LLM Serving"

This was the fourth AI and Cloud Workshop hosted by UW. Please see the UW WORKSHOP WEB PAGE for the speaker bios, abstracts and videos of the presentations.

AWARDS for IAP COLLEAGUES

2024 ACM Thacker Breakthrough in Computing Award (announced in April 2025)

Jason Cong, UCLA

ACM SIGARCH Maurice Wilkes Award

Carole-Jean Wu, Meta

ACM SIGARCH Alan D. Berenbaum Distinguished Service Award

Joel Emer, MIT and Nvidia

American Academy of Arts and Sciences - New Academy Members 2025

Kavita Bala, Cornell University

Christopher Manning, Stanford University

Dawn Song, University of California, Berkeley

2025 Sloan Fellows

Natacha Crooks, University of California, Berkeley

2024 ACM Fellows (announced in January 2025)

Nate Foster, Cornell

DK Panda, The Ohio State Universtity

Presidential Early Career Award for Scientists and Engineers

Christina Delimitrou, MIT

BEST DISSERTATION and PAPER AWARDS

2025 ACM SIGARCH/IEEE CS TCCA Outstanding Dissertation Award

Presented this week at ISCA in Tokyo!

Advancing the state-of-the-art in agile hardware/software co-design tools and deploying them at a hyperscaler to architect custom hardware that eliminates costly system-level WSC inefficiencies

Sagar Karandikar, UC Berkeley

BEST OF COMPUTER ARCHITECTURE LETTERS FOR 2024

The Importance of Generalizability in Machine Learning for Systems

Varun Gohil, Sundar Dev, Gaurang Upasani, David Lo, Parthasarathy Ranganathan, and Christina Delimitrou

ASPLOS 2025 BEST PAPER AWARDS

xUI: extended User Interrupts

Berk Aydogmus, Linsong Guo, Danial Zuberi, Tal Garfinkel, Dean Tullsen, Amy Ousterhout, Kazem Taram

H-Houdini: Scalable Invariant Learning

Sushant Dinesh, Yongye Zhu, Christopher W. Fletcher

ASPLOS 2025 ARTIFACT EVALUATION AWARDS

POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference

Aditya K Kamath, Ramya Prabhu, Jayashree Mohan, Simon Peter, Ramachandran Ramjee, Ashish Panwar

BLOGS AND OPINION

We Can’t Regulate Our Way to Crypto Leadership. We Still Need Science. NSF Cuts Threaten to Devastate US Research

Dan Boneh (Stanford), Joseph Bonneau (NYU), Giulia Fanti (Carnegie Mellon), Ben Fisch (Yale), Ari Juels (Cornell), Farinaz Koushanfar (U.C. San Diego), Andrew Miller (University of Illinois at Urbana Champaign), Ciamac Moallemi (Columbia), David Tse (Stanford), Pramod Viswanath (Princeton).

June 2, 2025

The Academic Pipeline Stall: Why Industry Must Stand for Academia

Vijay Janapa Reddi

ACM SIGARCH BLOG

May 6, 2025

See IEEE Spectrum’s Interview of Vijay about Funding in the News Section Below.

Energy Is Physics, but Emissions Is Accounting: What’s Really Green?

David Patterson

ACM SIGARCH BLOG

March 5, 2025

Holistic Evaluation of Large Language Models for Medical Applications

Nigam Shah, Mike Pfeffer, Percy Liang

Stanford HAI Blog

February 28, 2025

SELECT CONFERENCES and PUBLICATIONS in 1H 2025

POPL 2025, The 52nd ACM Symposium on Principles of Programming Languages, January 19-25, 2025, Denver, CO

A Research Career in Balance

Andrew Myers Cornell University

Universal Composability is Robust Compilation

Marco Patrignani University of Trento, Robert Künnemann CISPA Helmholtz Center for Information Security, Riad S. Wahby Stanford University, USA, Ethan Cecchetti University of Wisconsin-Madison

A Demonic Outcome Logic for Randomized Nondeterminism

Noam Zilberstein Cornell University, Dexter Kozen Cornell University, Alexandra Silva Cornell University, Joseph Tassarotti New York University

Flo: a Semantic Foundation for Progressive Stream Processing

Shadaj Laddad University of California at Berkeley, Alvin Cheung University of California at Berkeley, Joseph M. Hellerstein UC Berkeley, Mae Milano Princeton Universit

FAST '25 - The 23rd USENIX Conference on File and Storage Technologies - February 25-27, 2024, Santa Clara, CA, USA

Mooncake: Trading More Storage for Less Computation — A KVCache-centric Architecture for Serving LLM Chatbot

Ruoyu Qin, Moonshot AI and Tsinghua University; Zheming Li, Weiran He, and Jialei Cui, Moonshot AI; Feng Ren, Mingxing Zhang, Yongwei Wu, and Weimin Zheng, Tsinghua University; Xinran Xu, Moonshot AI

Awarded Best Paper!

Cloudscape: A Study of Storage Services in Modern Cloud Architectures

Sambhav Satija, Chenhao Ye, Ranjitha Kosgi, Aditya Jain, Romit Kankaria, Yiwei Chen, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin–Madison; Kiran Srinivasan, NetApp

HPCA 2025 - The 31st IEEE International Symposium on High-Performance Computer Architecture - March 1-5, 2025, Las Vegas, NV

Reuse-Aware Compilation for Zoned Quantum Architectures Based on Neutral Atoms

Wan-Hsuan Lin (UCLA), Daniel Bochen Tan (UCLA), Jason Cong (UCLA)

The Importance of Generalizability in Machine Learning for Systems

Varun Gohil (Massachusetts Institute of Technology), Sundar Dev (Google), Gaurang Upasani (Google), David Lo (Google), Parthasarathy Ranganathan (Google), Christina Delimitrou (Massachusetts Institute of Technology)

SELECTED BEST OF COMPUTER ARCHITECTURE LETTERS FOR 2024

SPARK – Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems

Siddhartha Raman Sundara Raman (The University of Texas at Austin), Lizy Kurian John (UT Austin), Jaydeep Kulkarni (University of Texas, Austin)

MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI

Arya Tschand (Harvard University), Arun Tejusve Raghunath Rajan (Self / Meta), Sachin Idgunji (NVIDIA), Anirban Ghosh (NVIDIA), Jeremy Holleman (UNC Charlotte / Syntiant), Csaba Kiraly (Codex), Pawan Ambalkar (Dell), Ritika Borkar (NVIDIA), Ramesh Chukka (Intel), Trevor Cockrell (Dell), Oliver Curtis (SMC), Grigori Fursin (FlexAI / cTuning), Miro Hodak (AMD), Hiwot Kassa (Meta), Anton Lokhmotov (KRAI), Dejan Miskovic (NVIDIA), Yuechao Pan (Google), Manu Prasad Manmathan (Intel), Liz Raymond (Dell), Tom St. John (Decompute), Arjun Suresh (GATE Overflow), Rowan Taubitz (SMC), Sean Zhan (SMC), Scott Wasson (MLCommons), David Kanter (MLCommons), Vijay Janapa Reddi (Harvard University)

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization

Jianbo Dong (Alibaba Group, Bin Luo (Alibaba Group), Jun Zhang (Alibaba Group), Pengcheng Zhang (Alibaba Group), Fei Feng (Alibaba Group), Yikai Zhu (Alibaba Group), Ang Liu (Alibaba Group), Zian Chen (Alibaba Group), Yi Shi (Alibaba Group), Yang Liu (Alibaba Group), Hairong Jiao (Alibaba Group), Gang Lu (Alibaba Group), Yu Guan (Alibaba Group), Ennan Zhai (Alibaba Group), Wencong Xiao (Alibaba Group), Hanyu Zhao (Alibaba Group), Man Yuan (Alibaba Group), Siran Yang (Alibaba Group), Xiang Li (Alibaba Group), Jiamang Wang (Alibaba Group), Rui Men (Alibaba Group), Jianwei Zhang (Alibaba Group), Chang Zhou (Alibaba Group), Dennis Cai (Alibaba Group), Yuan Xie (Alibaba Group), Binzhang Fu (Alibaba Group)

Revisiting Reliability in Large-Scale Machine Learning Research Clusters

Apostolos Kokolis (Meta), Michael Kuchnik (Meta), John Hoffman (Meta), Adithya Kumar (Meta), Parth Malani (Meta), Faye Ma (Meta), Zachary DeVito (Meta), Shubho Sengupta (Meta), Kalyan Saladi (Meta), Carole-Jean Wu (Meta)

CORDOBA: Carbon-Efficient Optimization Framework for Computing Systems

Mariam Elgamal (Harvard University), Doug Carmean (Meta), Elnaz Ansari (Meta), Okay Zed (Meta), Ramesh Peri (Meta), Srilatha Manne (Meta), Udit Gupta (Meta), Gu-Yeon Wei (Harvard University), David Brooks (Harvard University), Gage Hills (Harvard University), Carole-Jean Wu (Meta)

ARTEMIS: Agile Discovery of Efficient Real-Time Systems-on-Chips in the Heterogeneous Era

Subhankar Pal (IBM Research), Aporva Amarnath (IBM Research), Behzad Boroujerdian (University of Texas at Austin / Harvard University), Augusto Vega (IBM Research), Alper Buyuktosunoglu (IBM Research), John-David Wellman (IBM Research), Vijay Janapa Reddi (Harvard University), Pradip Bose (IBM Research)

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

Yujun Lin (MIT), Zhekai Zhang (MIT), Song Han (MIT)

Ariadne: A Hotness-Aware and Size-Adaptive Compressed Swap Technique for Fast Application Relaunch and Reduced CPU Usage on Mobile Devices

Yu Liang (ETH Zürich), Aofeng Shen (ETH Zürich), Chun Jason Xue (MBZUAI), Riwei Pan (City University of Hong Kong), Haiyu Mao (ETH Zürich), Nika Mansouri Ghiasi (ETH Zürich), Qingcai Jiang (ETH Zürich and University of Science and Technology of China), Rakesh Nadig (ETH Zürich), Lei Li (City University of Hong Kong), Rachata Ausavarungnirun (MangoBoost), Mohammad Sadrosadati (ETH Zürich), Onur Mutlu (ETH Zürich)

ASPLOS 2025 - The 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 30-April 3, Rotterdam, The Netherlands

Keynote (Joint with Eurosys 2025): Has Machine Learning for Systems Reached an Inflection Point?

Martin Maas, Google

CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing Packing and In-Flash Processing

Mayank Kabra (ETH Zurich), Rakesh Nadig (ETH Zurich), Harshita Gupta (ETH Zurich), Manos Frouzakis (ETH Zurich), Rahul Bera (ETH Zurich), Vamanan Arulchelvan (ETH Zurich), Yu Liang (ETH Zurich), Haiyu Mao (ETH Zurich), Mohammad Sadrosadati (ETH Zurich), Onur Mutlu (ETH Zurich)

Composing Distributed Computations Through Task and Kernel Fusion

Rohan Yadav (Stanford University), Shiv Sundram (Stanford University), Wonchan Lee (NVIDIA), Michael Garland (NVIDIA), Michael Bauer (NVIDIA), Alex Aiken (Stanford University), Fredrik Kjolstad (Stanford University)

Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud Platforms

Benjamin Reidys (University of Illinois Urbana-Champaign), Pantea Zardoshti (Microsoft), Íñigo Goiri (Microsoft), Celine Irvene (Microsoft), Daniel S. Berger (Microsoft,University of Washington), Haoran Ma (University of California-Los Angeles), Kapil Arya (Microsoft), Eli Cortez (Microsoft), Taylor Stark (Microsoft), Eugene Bak (Microsoft), Mehmet Iyigun (Microsoft), Stanko Novaković (Google), Lisa Hsu (Meta), Karel Trueba (Microsoft), Abhisek Pan (Microsoft), Chetan Bansal (Microsoft), Saravan Rajmohan (Microsoft), Jian Huang (University of Illinois Urbana-Champaign), Ricardo Bianchini (Microsoft)

Copper and Wire: Bridging Expressiveness and Performance for Service Mesh Policies

Divyanshu Saxena (The University of Texas at Austin), William Zhang (The University of Texas at Austin), Shankara Pailoor (The University of Texas at Austin), Isil Dillig (The University of Texas at Austin), Aditya Akella (The University of Texas at Austin)

Necro-reaper: Pruning away Dead Memory Traffic in Warehouse-Scale Computers

Sotiris Apostolakis (Google), Chris Kennelly (Google), Xinliang David Li (Google), Parthasarathy Ranganathan (Google)

ReCA: Integrated Acceleration for Real-Time and Efficient Cooperative Embodied Autonomous Agents

Zishen Wan (Georgia Institute of Technology), Yuhang Du (University of Minnesota, Twin Cities), Mohamed Ibrahim (Georgia Institute of Technology), Jiayi Qian (Georgia Institute of Technology), Jason Jabbour (Harvard University), Yang (Katie) Zhao (University of Minnesota, Twin Cities), Tushar Krishna (Georgia Institute of Technology), Arijit Raychowdhury (Georgia Institute of Technology), Vijay Janapa Reddi (Harvard University)

SuperNoVA: Algorithm-Hardware Co-Design for Resource-Aware SLAM

Seah Kim (University of California, Berkeley), Roger Hsiao (University of California, Berkeley), Borivoje Nikolić (University of California, Berkeley), James Demmel (University of California, Berkeley), Yakun Sophia Shao (University of California, Berkeley)

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Shiyi Cao (UC Berkeley), Shu Liu (UC Berkeley), Tyler Griggs (UC Berkeley), Peter Schafhalter (UC Berkeley), Xiaoxuan Liu (UC Berkeley), Ying Sheng (Stanford University), Joseph E. Gonzalez (UC Berkeley), Matei Zaharia (UC Berkeley), Ion Stoica (UC Berkeley)

PCcheck: Persistent Concurrent Checkpointing for ML

Foteini Strati (ETH Zurich), Michal Friedman (ETH Zurich), Ana Klimovic (ETH Zurich)

POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference

Aditya K Kamath (Paul G Allen School of Computer Science and Engineering, University of Washington), Ramya Prabhu (Microsoft Research India), Jayashree Mohan (Microsoft Research India), Simon Peter (Paul G Allen School of Computer Science and Engineering, University of Washington), Ramachandran Ramjee (Microsoft Research India), Ashish Panwar (Microsoft Research India)

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Byungsoo Jeon (NVIDIA), Mengdi Wu (Carnegie Mellon Univerisity), Shiyi Cao (UC Berkeley), Sunghyun Kim (Massachusetts Institute of Technology), Sunghyun Park (NVIDIA), Neeraj Aggarwal (Carnegie Mellon University), Colin Unger (Stanford University), Daiyaan Arfeen (Carnegie Mellon University), Peiyuan Liao (Carnegie Mellon University), Xupeng Miao (Carnegie Mellon University), Mohammad Alizadeh (Massachusetts Institute of Technology), Gregory R. Ganger (Carnegie Mellon University), Tianqi Chen (Carnegie Mellon University), Zhihao Jia (Carnegie Mellon University)

EUROSYS 2025 - March 30-April 3, Rotterdam, The Netherlands, March 30- April 3, 2025

DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs

Xiaozhe Yao (ETH Zurich), Qinghao Hu (MIT), Ana Klimovic (ETH Zurich)

SkyServe: Serving AI Models across Regions and Clouds with Spot Instances

Ziming Mao (UC Berkeley), Tian Xia (UC Berkeley), Zhanghao Wu (UC Berkeley), Wei-Lin Chiang (UC Berkeley), Tyler Griggs (UC Berkeley), Romil Bhardwaj (UC Berkeley), Zongheng Yang (UC Berkeley), Scott Shenker (ICSI AND UC Berkeley), Ion Stoica (UC Berkeley)

Impeller: Stream Processing on Shared LogsZhiting Zhu (Lepton AI), Zhipeng Jia (Google), Newton Ni (University of Texas at Austin), Dixin Tang (UT Austin), Emmett Witchel (UT Austin)

NSDI '25 - The 21st USENIX Symposium on Networked Systems Design and Implementation - April 28-30, 2025, Philadelphia, PA

Enabling Portable and High-Performance SmartNIC Programs with Alkali

Jiaxin Lin, UT Austin; Zhiyuan Guo, UCSD; Mihir Shah, NVIDIA; Tao Ji, Microsoft;Yiying Zhang, UCSD; Daehyeok Kim and Aditya Akella, UT Austin

NDD: A Decision Diagram for Network Verification

Zechun Li, Peng Zhang, and Yichi Zhang, Xi'an Jiaotong University; Hongkun Yang, Google

Awarded Outstanding Paper!

Smart Casual Verification of the Confidential Consortium Framework

Heidi Howard, Markus A. Kuppe, Edward Ashton, and Amaury Chamayou, Azure Research, Microsoft; Natacha Crooks, Azure Research, Microsoft and UC Berkeley

Preventing Network Bottlenecks: Accelerating Datacenter Services with Hotspot-Aware Placement for Compute and Storage

Hamid Hajabdolali Bazzaz, Yingjie Bi, and Weiwu Pang, Google; Minlan Yu, Harvard University; Ramesh Govindan, University of Southern California; Neal Cardwell, Nandita Dukkipati, Meng-Jung Tsai, Chris DeForeest, and Yuxue Jin, Google; Charles Carver, Columbia University; Jan Kopanski, Liqun Cheng, and Amin Vahdat, Google

White-Boxing RDMA with Packet-Granular Software Control

Chenxingyu Zhao and Jaehong Min, University of Washington; Ming Liu, University of Wisconsin-Madison; Arvind Krishnamurthy, University of Washington

SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision

Xizheng Wang, Alibaba Cloud and Tsinghua University; Qingxu Li, Yichi Xu, and Gang Lu, Alibaba Cloud; Dan Li, Tsinghua University; Li Chen, Zhongguancun Laboratory; Heyang Zhou, Alibaba Cloud; Linkang Zheng, Alibaba Cloud and South China University of Technology; Sen Zhang, Yikai Zhu, Yang Liu, Pengcheng Zhang, Kun Qian, Kunling He, Jiaqi Gao, and Ennan Zhai, Alibaba Cloud; Dennis Cai, Alibaba Group; Binzhang Fu, Alibaba Cloud

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads

Alind Khare and Dhruv Garg, Georgia Institute of Technology; Sukrit Kalra, UC Berkeley; Snigdha Grandhi, Adobe; Ion Stoica, UC Berkeley; Alexey Tumanov, Georgia Institute of Technology

High-level Programming for Application Networks

Xiangfeng Zhu, Yuyao Wang, Banruo Liu, Yongtong Wu, and Nikola Bojanic, University of Washington; Jingrong Chen, Duke University; Gilbert Louis Bernstein and Arvind Krishnamurthy, University of Washington; Sam Kumar, University of Washington and UCLA; Ratul Mahajan, University of Washington; Danyang Zhuo, Duke University

Eden: Developer-Friendly Application-Integrated Far Memory

Anil Yelam, Stewart Grant, and Saarth Deshpande, UC San Diego; Nadav Amit, Technion, Israel Institute of Technology; Radhika Niranjan Mysore, VMware Research Group; Amy Ousterhout, UC San Diego; Marcos K. Aguilera, VMware Research Group; Alex C. Snoeren, UC San Diego

MLSys 2025 - May 12-15, 2025, Santa Clara, CA

Pipe Fill: Using GPUs During Bubbles in Pipeline-parallel LLM Training

Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training

Mingyu Liang, Hiwot Kassa, Wenyin Fu, Brian Coutinho, Louis Feng, Christina Delimitrou

Scaling Deep Learning Training with MPMD Pipeline Parallelism

Anxhelo Xhebraj, Sean Lee, Hanfeng Chen, Vinod Grover

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Servin g

Zihao Ye, Lequn Chen, Ruihang Lai, Wuwei Lin, Yineng Zhang, Stephanie Wang, Tianqi Chen, Baris Kasikci, Vinod Grover, Arvind Krishnamurthy, Luis Ceze

AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

Zhiqiang Xie, Hao Kang, Ying Sheng, Tushar Krishna, Kayvon Fatahalian, Christos Kozyrakis

QServe:W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

Optimizing LLM Queries in Relational Data Analytics Workloads

Shu Liu, Asim Biswal, Audrey Cheng, Amog Kamsetty, Luis Gaspar Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph Gonzalez, Matei Zaharia

ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments

YOUHE JIANG, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin CUI, Ana Klimovic, Eiko Yone ki

ISCA 52 - The International Symposium on Computer Architecture

June 21-25, 2025, Tokyo, Japan

In-Storage Acceleration of Retrieval Augmented Generation as a Service

Rohan Mahapatra, Harsha Santhanam, Christopher Priebe, Hanyang Xu, Hadi S. Esmaeilzadeh

DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hardware Codesign

Derrick Quinn, E. Ezgi Yicel, Martin Prammer, Zhenxing Fan, Kevin Skadron, Jignesh Patel, Jose F. Martinez, Mohammad Alian

Transitive Array: An Efficient GEMM Accelerator with Result Reuse

Cong Guo, Chiyue Wei, Jiaming Tang, Bowen Duan, Song Han, Hai "Helen" Li, Yiran Chen

REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storage Processing

Kangqi Chen, Rakesh Nadig, Nika Mansouri Ghiasi, Yu Liang, Haiyu Mao, Jisung Park, Manos Frouzakis, Mohammad Sadrosadati, Onur Mutlu

Reconfigurable Stream Network Architecture

Chengyue Wang, Xiaofan Zhang, Jason Cong, James C. Hoe

GRANTS

2025 UW Tsukuba NVIDIA Amazon Cross-Pacific AI Initiative, “Inferring Strategic Behavior to Ensure Trustworthy Multi-agent AI Systems”, $720,000. Lead PI: Natasha Jaques, UW. Co PIs: Lillian Ratliff, Max Kleiman-Weiner, Simon Du, Kevin Jamieson

PROJECTS

PROJECTS in ML SYSTEMS

ML Systems with Tiny ML

An update is imminent of this community-driven project, with content generated collaboratively by numerous contributors over time. The content creation process may have involved various editing tools, including generative AI technology. As the main author, editor, and curator, Prof. Vijay Janapa Reddi maintains human oversight and editorial control to ensure the accuracy and relevance of the content. Have you got questions or feedback? Feel free to e-mail Prof. Vijay Janapa Reddi directly, or you are welcome to start a discussion thread on GitHub.

JOBS

Prof. Nate Foster notified us that the P4 project is hiring an Ecosystem Lead to promote, nurture, and grow the P4 ecosystem.

NEWS ITEMS

June 12, 2025

AMD reveals next-generation AI chips with OpenAI CEO Sam Altman

May 28, 2025

How a Harvard Engineer Lost Three Grants in One Day: A Harvard professor worries about brain drain and the impact on innovation

April 7, 2025

Stanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

March 14, 2025

Mobile World Congress 2025: AI’s Next Leap Requires More Than Data Centers

January 29, 2025

Alibaba Unveils Upgraded AI Model, Claims It Surpasses Rival DeepSeek-V3

The announcement follows a market frenzy triggered by Chinese AI startup DeepSeek

January 14, 2025

Alibaba, Apple and Synopsys Join Ultra Accelerator Link Consortium Board

IAP Workshop Testimonials

Professor David Patterson, the Pardee Professor of Computer Science, UC Berkeley, “I saw strong participation at the Cloud Workshop, with some high energy and enthusiasm; and I was delighted to see industry engineers bring and describe actual hardware, representing some of the newest innovations in the data center.”

Professor Christos Kozyrakis, Professor of Electrical Engineering & Computer Science, Stanford University, “As a starting point, I think of these IAP workshops as ‘Hot Chips meets ISCA’, i.e., an intersection of industry’s newest solutions in hardware (Hot Chips) with academic research in computer architecture (ISCA); but more so, these workshops additionally cover new subsystems and applications, and in a smaller venue where it is easy to discuss ideas and cross-cutting approaches with colleagues.”

Professor Hakim Weatherspoon, Professor of Computer Science, Cornell University, “I have participated in three IAP Workshops since the first one at Cornell in 2013 and it is great to see that the IAP premise was a success now as it was then, bringing together industry and academia in a focused workshop and an all-day exchange of ideas. It was a fantastic experience and I look forward to the next IAP Workshop.”

Professor Ken Birman, the N. Rama Rao Professor of Computer Science, Cornell University, “I actually thought it was a fantastic workshop, an unquestionable success, starting from the dinner the night before, through the workshop itself, to the post-event reception for the student Best Poster Awards.”

Dr. Carole-Jean Wu, Research Scientist, AI Infrastructure, Facebook Research, and Professor of CSE, Arizona State University, “The IAP Cloud Computing workshop provides a great channel for valuable interactions between faculty/students and the industry participants. I truly enjoyed the venue learning about research problems and solutions that are of great interest to Facebook, as well as the new enabling technologies from the industry representatives. The smaller venue and the poster session fostered an interactive environment for in-depth discussions on the proposed research and approaches and sparked new collaborative opportunities. Thank you for organizing this wonderful event! It was very well run.”

Nathan Pemberton, PhD student, UC Berkeley (currently Applied Scientist at AWS), "IAP workshops provide a valuable chance to explore emerging research topics with a focused group of participants, and without all the time/effort of a full-scale conference. Instead of rushing from talk to talk, you can slow down and dive deep into a few topics with experts in the field."

Dr. Pankaj Mehra, VP Product Planning, Samsung (currently CEO Elephance Memory and Professor at The Ohio State University), "Terrifically organized Workshops that give all parties -- students, faculty, industry -- valuable insights to take back"

Professor Vishal Shrivastav, Purdue University, “Attending the IAP workshops as a PhD student at Cornell was a great experience and very rewarding. I really enjoyed the many amazing talks from both the industry and academia. My personal conversations with several industry leaders at the workshop will definitely guide some of my future research."

Professor Ana Klimovic, ETH Zurich, “I attended three IAP workshops as a PhD student at Stanford, and I am consistently impressed by the quality of the talks and the breadth of the topics covered. These workshops bring top-tier industry and academia together to discuss cutting-edge research challenges. It is a great opportunity to exchange ideas and get inspiration for new research opportunities."

Dr. Richard New, VP Research, Western Digital, “IAP workshops provide a great opportunity to meet with professors and students working at the cutting edge of their fields. It was a pleasure to attend the event – lots of very interesting presentations and posters.”

Support a unique tech forum that brings together academia and industry under your company's banner?

Please feel free to contact us regarding sponsorship opportunities, and for more info about any of the items above.

Best,

Jim Ballingall

Executive Director

Industry-Academia Partnership (IAP)

www.industry-academia.org

jim.ballingall@gmail.com

cel: 408-212-1035

About the IAP

The Industry-Academia Partnership (IAP) is in its 13th year hosting events and conducting projects about applications and infrastructure for AI and machine learning, hardware acceleration, networking, security, and storage. For more info, please see www.industry-academia.org

Stanford Prof. Christos Kozyrakis (left) and UCSC Prof. Heiner Litz welcome attendees at the 8:30am kick-off of the 2018 Stanford/UCSC Workshop at the UCSC Silicon Valley campus.