Watch

The seeds of Simplex AI safety

Paul Riechers & Adam Shai present "Simplex — Building the Science of Predictive systems to Enable AI Safety" at the FAR Seminar series, June 2024.

What can you learn from next-token prediction? Following the math makes modern AI less of a black box. We predict and discover internal representations---fractal geometries of activations correspond to optimal beliefs about the future. We can understand why and how models like ChatGPT synchronize to their users with increasing context.

In this short Spotlight talk (at the Fifth International Convention on the Mathematics of Neuroscience and AI in Rome) Paul Riechers, PhD, co-founder of Simplex, describes the theoretical basis of this new journey towards understanding the mind of AI.

Our co-founder, Dr. Paul Riechers, describes the most foundational theoretical underpinnings for how we can understand and intervene upon the internal representations and behaviors of modern and future artificial intelligence.

The 2024 Comp Mech Hackathon was organized by Simplex, PIBBSS and Apart Labs, as a way of further introducing Computational Mechanics to the AI Safety field. This new AI Safety and interpretability approach is built upon a rigorous mathematical framework from physics and enables precise predictions about the internal geometry of AI systems, overcoming limitations of current interpretability methods.understanding and control.

The final presentations by teams for the Computational Mechanics Hackathon in June of 2024. Titles of talks in order:

1. Exploring Hierarchical Structure Representation in Transformer Models through Computational Mechanics

2. Handcrafting a Network to Predict Next Token Probabilities for the Random-Random-XOR Process

3. Steering Model’s Belief States

4. RNNs represent belief state geometry in hidden state

5. Belief State Representations in Transformer Models on Nonergodic Data

6. Investigating the Effect of Model Capacity Constraints on Belief State Representations

7. Looking forward to posterity: what past information is transferred to the future?

8. Unsupervised Recovery of Hidden Markov Models from Transformers with Evolutionary Algorithms

Sep 28, 2024 • AXRP interview

Sometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural networks.

We anticipated and found that transformer neural networks represent fractal belief-state geometry in their residual stream. These fractals correspond to optimal beliefs about the future. At Simplex, we are leveraging and scaling these insights to bridge mechanism and behavior in AI, attaining a new level of understanding and control.