Reflections on the panel "The Future of AI: From LLMs to Agents and Beyond" – DP2E-AI 2025
On June 18, I had the honor of participating in a scientific panel during the DP2E-AI 2025 workshop about the future of large-scale artificial intelligence. Moderated by Ian Foster (Argonne National Laboratory), the session brought together Torsten Hoefler (ETH Zurich), Horst Simon (former LBNL), Kun Tan (Huawei), and me to explore what might come after large language models and autonomous agents.
Here is a synthesis of our discussion, centered on a key challenge: how can we keep advancing AI while controlling its costs?
🧮 Modern AI has a massive appetite for compute
Today’s AI, especially large language models (LLMs), relies on training through stochastic gradient descent. This method demands:
- huge datasets, typically around 15 trillion tokens (about 45TB of text — essentially the entire internet),
- hundreds of billions of parameters, roughly 600 billion for DeepSeek,
- and, above all, numerous training passes to adjust the network weights.
This process is extremely expensive in time, energy, hardware, and money. A single training run can cost several hundred million dollars. Reducing these costs without hurting performance has become a strategic priority.
🧠 HPC paved the way for democratizing GPUs for computation
About twenty years ago, GPUs were used almost exclusively for graphics (video games, 3D visualization…). But the HPC community, always hungry for compute power, opened a new path: using GPUs for general-purpose scientific calculations — the GPGPU (General Purpose GPU) movement.
One of the first operations exploited at scale was the matrix product, the foundation of many scientific algorithms… and now at the core of every neural network.
⚡ AI leveraged this shift to access affordable compute power
Because AI models depend heavily on matrix products, they were able to immediately benefit from GPU progress, without requiring specialized hardware. This made it possible to train increasingly large models with growing performance… at a reasonable cost, at least initially.
This convergence — GPUs, scientific computing, AI — enabled the rise of the first large-scale deep learning architectures.
🔧 HPC expertise has driven dramatic cost reductions in AI
Techniques born in HPC were reinvested in AI pipelines, enabling extreme optimization of computations:
- reducing floating-point precision (e.g., FP32 → FP16 → FP4),
- exploiting sparse matrices (sparsity),
- designing new network topologies to reduce contention points.
Each of these levers delivered about a 10x cost reduction, or a 1000x combined. These techniques are now well known, widely used, and starting to plateau. This raises a new question: how do we find the next 1000x factor?
🔁 In return, AI has reshaped hardware—and revived innovation in HPC
Faced with AI’s exponential demand, chip makers have adapted their architectures:
- accelerators dedicated to matrix products (e.g., Tensor Cores, TPUs),
- high-performance interconnects,
- specialized compute units with extremely low precision.
These hardware evolutions, initially designed for AI, now offer new perspectives for scientific computing. For instance, emulating high-precision operations (FP64) on FP16 hardware is becoming feasible (see Jack Dongarra’s work), opening the way to faster, more efficient HPC.
⛓️ Yet AI has also locked itself into a rigid technical model
This hardware specialization has a flip side: it locks AI into a highly constrained paradigm. Any high-performance algorithm must be expressed as a series of matrix products, otherwise it cannot leverage the available hardware.
This constraint limits exploration of new approaches, even though such exploration is essential for reaching the next milestones.
📈 Compute needs explode with multi-agent approaches
To explore new strategies—reasoning, coordination, collaboration—recent models gravitate toward multi-agent setups where multiple AIs interact, complement each other, and even self-evaluate. Combining techniques like chain-of-thought and test-time compute requires dramatically multiplying the tokens processed to produce a single response. These systems are even more resource-hungry, because they multiply parallel computations and combinations of hypotheses.
The economic and ecological limits of this trajectory are becoming apparent.
🧠 The “intuitive” ideas have been exhausted: a conceptual shift is needed
As Torsten Hoefler pointed out, all the straightforward ideas to accelerate or improve models have already been explored:
- compressing weights,
- simplifying architectures,
- optimizing infrastructure…
To keep making progress, we will likely need to change our perspective. That means gaining a deeper understanding of what information really is and how it is represented, structured, and manipulated by a neural network.
🌍 A promising avenue: letting AI interact with the real world
One direction we discussed is connecting AI to the physical world so it can learn by acting and observing, not just from textual or static image data.
Two complementary approaches stand out:
- learning robots, which interact with their physical environment to infer broader behavioral laws than those learned from human-origin content alone;
- scientist agents, capable of designing and running experiments in the lab (e.g., Ian Foster’s work at Argonne).
By confronting reality, AI could generate its own data, uncover new regularities, and even formulate hypotheses.
🧠 In conclusion: a new frontier for AI—and for HPC
As Horst Simon summarized at the end of the panel: AI is today the “killer app” for HPC — the application that pushes extreme computing to innovate and reinvent itself.
Yet despite the advances made, we are only scratching the surface of what’s possible. There are many challenges left to tackle, and with them immense opportunities for innovation, creativity, and research. Tomorrow’s AI will be built at the intersection of disciplines: algorithms, physics, engineering, and, more broadly, our ability to think differently.