Despite remarkable advances in artificial intelligence, the fundamental principles underlying learning and intelligent systems have yet to be identified. What makes our world and its data inherently learnable? How do natural or artificial brains learn? Physicists are well positioned to address these questions. They seek fundamental understanding and construct effective models without being bound by the strictures of mathematical rigor nor the need for state-of-the-art engineering performance. This mindset, recognized by the 2024 Nobel Prize in Physics awarded to AI pioneers, is needed to uncover the fundamental principles of learning. Our group advances this research frontier.
Research Focus and Synergies
Faculty members Brice Ménard, Matthieu Wyart, Soledad Villar, and Jared Kaplan, together with their research teams, are developing theoretical foundations for artificial intelligence. Their work addresses central questions in the physics of learning:
- How is learned information encoded in neural representations?
- Do neural networks exhibit universal properties? Can we construct a thermodynamic theory of learning?
- What determines how performance scales with model size and computational resources?
- What properties of data make it learnable?
These questions are essential for constructing a comprehensive, unifying theory of neural learning and computation. The answers will illuminate not only how artificial systems learn, but may also reveal fundamental principles that govern learning in biological brains.
The group is expanding significantly over the coming years with several new faculty appointments together with their graduate students and postdoctoral researchers. This growth reflects the recognition that physics-based approaches to understanding learning represent a vital frontier in scientific inquiry.
Research in the physics of learning is inherently interdisciplinary. Our group members collaborate extensively with colleagues across the departments of cognitive/neuroscience science, computer science, and applied mathematics & statistics.
Additional faculty members incorporating AI methods into their research programs include Alex Szalay, Ben Wandelt, Petar Maksimovic, Yi Li, and Tyrel McQueen.
Join Our Research Community
PhD Program in the Physics of Learning
This program, created in 2024, is aimed at preparing graduate students to become leaders in AI research in academia or industry. It offers advanced courses in Machine Learning and statistical physics. Students have the opportunity to work on semester-long research projects before focusing on a specific topic for the completion of the PhD.
The first graduate students working on the physics of learning started at Johns Hopkins in 2025. We are now recruiting the next cohort. If you are completing your undergraduate degree and are passionate about understanding the fundamental principles of learning and intelligence, we encourage you to apply. We welcome applications from candidates with backgrounds in physics, mathematics, computer science, neuroscience, or related fields who are eager to tackle fundamental questions about learning and intelligence.
Research Highlights
Opening the black box of neural networks (B. Ménard)
What do neural networks learn? Do different networks learn to perform a task in the same way? What can we say about the learned encoding? We explore the universality of neural encodings in convolutional neural networks trained on image classification tasks. Our results reveal the existence of universal neural encodings. They explain, at a more fundamental level, the success of transfer learning and the origin of foundation models. In collaboration with neuroscientists, we have shown that signatures of universality in learning are also found in the visual cortex of humans and mice.
How does a network encode a dataset as a function of depth? Our team has developed new techniques to quantify the number of features learned at each layer. Examples are shown for ImageNet (1 million images), shuffled versions and various gaussian random fields.
Creativity by compositionality in machine learning (M. Wyart)
Generative models, such as Large Language or diffusion Models, manage to learn high dimensional distributions. This feat is generically impossible, unless data are highly structured. What is the nature of this structure? In a sequence of works, the Wyart’s group has shown that if data hierarchically compose features at different levels (as illustrated in Fig.1A), then (i) deep nets (but not shallow ones) can learn a classification task with a number of data polynomial, instead of exponential, in the dimension of the problem. The same holds true for transformers learning the data distribution by training on next token prediction. An intriguing consequence of this viewpoint is the existence of a phase transition in diffusion models at some noise level, below which Forward-Backward experiments (see Fig.1B) display a phase transition. Overall, this analysis supports that the success of generative models lies in their ability to compose a new whole from previously observed low-level elements- a fact relevant for tasks ranging from reasoning to the composition of a new image.
A – Sketch of the latent structure of data that are composed hierarchically. B – Forward-Backward experiments in diffusion models, where noise is added from the initial leftmost image, and then removed. For small noise, low-level elements of the snow leopard (such as eye color) are affected. At larger noise, the theory predicts that the class will change, but will still use low-level elements of the initial picture: the fox shares the nose, eyes and ears of the snow leopard. For even larger noise, a butterfly hijacks the leopard spots.
Neural scaling laws (J. Kaplan)
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.
Scaling Laws for Neural Language Models
Machine learning with symmetries, structured like classical physics (S. Villar)
Any representation of data involves arbitrary investigator choices. Because those choices are external to the data-generating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry, and units covariance, all of which have led to important results in physics. In machine learning, the most visible passive symmetry is the relabeling or permutation symmetry of graphs. The active symmetries are those that must be established by observation and experiment. They include, for instance, translations invariances or rotation invariances of physical law. In our work we design machine learning models that satisfy the (active and passive) symmetries. In order to do so we take inspiration from classical invariant theory and classical physics.
(Left) Equivariant convolution of a scalar image with a geometric pseudovector filter on top, and a geometric pseudoscalar filter on the bottom. (Right) Illustration of the results of a UNET with geometric filters trained to predict the dynamics of a compressible Navier-Stokes equation. The corresponding paper won a best paper award at the workshop Machine Learning for the Physical Sciences at NeurIPS 2024.



