“Imagine that a million monkeys have been trained to hit the keys of ... a million typewriters ... after a year, their volumes would contain the exact copy of books of all kinds and in all languages.”

– Émile Borel, Statistical Mechanics and Irreversibility, 1913

Infinite Monkey is a team initially focused on discovering new architectures for artificial general intelligence (AGI) via search and compute.

The search for AGI

Humanity’s rate of progress is intelligence- and energy-constrained. Building an AI system that can achieve or surpass human-level general intelligence and efficiency is one of few technologies that could inflect our overall rate of progress.

To date (2024) all known artificial intelligence systems are far off performance curves of humans in terms of training data, energy efficiency[1], and general intelligence[2].

Human brains exist. Therefore a computational architecture with desirable general intelligence and efficiency properties exists.

Transformers also exist. While inefficient and not generally intelligent, transformer-based AI systems have achieved relatively-high domain generality over tasks [3] they are trained on. Transformers currently exist atop the sequence-to-sequence tech tree of AI systems and demonstrate remarkable task generalization that emerged with scale.

We expect many more architectures to exist, with varying properties like task domain generality, efficiency, and intelligence. And in fact, they do: RNNs, CNNs, LSTMs, SSMs, GANs, VAEs, and DDPMs. The architecture-space is indeed rich, with transformers currently on top.

But we want AGI, and transformers (along with all other known AI systems) leave us wanting. Fortunately we know human brains exist therefore discovering AGI is a search problem and not an invention problem.

“The Bitter Lesson” tells us search and learning are the only techniques that have “worked” in AI but they are computationally intensive and only recently viable[4]. Sutton writes:

Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.

But to date, Sutton’s lesson has one critical exception: all AI architectures are artisanally hand-crafted by AI researchers who then bolt on the industrial pipeline of data and compute. We have not yet effectively leveraged compute for AI architecture discovery.

By simplified example: transformers emerged in 2017 many years downstream of machine translation research (ex. translating English to Spanish.) Human insight led to using RNNs and CNNs for variable length input vs output (English and Spanish words are not the same length.) Human insight also led to the concept of “attention” so the system could consider different parts of the input to predict output (English adjectives come before nouns, Spanish after.) The final human insight was realizing “attention is all you need”. The transformer was created by dropping RNNs and CNNs, optimizing the architecture, enabling new scale and capability emergence[5][6].

Architecture search as a concept is not new, it’s commonly referred to as neural architecture search or NAS[7]. Empirically, only one unsupervised search has ever resulted in the discovery of a general intelligence architecture: evolution → human brains. The current research situation is actually worse than that though, NAS has not played an important role in any contemporary AI system and current state-of-the-art NAS is no better than a random strategy[8].

Dwelling on the gap between existence proof and results:

Existence proofs are critical and the ultimate prize, AGI, is possible. This motivates a fresh search [10]. The project namesake is a nod to the fun theorem that an infinite amount of monkeys typing on an infinite number of keyboards will eventually produce Shakespeare. Fortunately for us, it did not take evolution an infinite amount of time nor energy to discover AGI.


🐵 🐵 🐵