research

Researchers release 13B-parameter language model trained exclusively on pre-1931 data

TL;DR

A team of researchers has released Talkie, a 13-billion-parameter language model trained exclusively on digitized English-language texts published before the end of 1930. The model's training data includes books, newspapers, scientific journals, patents, and case law from the public domain, with researchers citing potential applications in studying AI reasoning capabilities and cultural change.

2 min read
0

Researchers release 13B-parameter language model trained exclusively on pre-1931 data

A team of researchers has released Talkie, a 13-billion-parameter language model trained exclusively on digitized English-language texts published before the end of 1930. The model uses only public domain materials including books, newspapers, periodicals, scientific journals, patents, and case law.

The training data cutoff was chosen because 1930 is the current public domain year in the United States. According to the researchers, Talkie is the largest vintage language model they are aware of, though they note other vintage models trained on Victorian literature and pre-1900 scientific texts already exist.

Research applications

David Duvenaud, associate professor in computer science and statistics at the University of Toronto and one of three creators behind Talkie, outlined three primary research objectives. First, the team aims to test AI's ability to make scientific discoveries using only historical knowledge. The researchers cite a test proposed by Google DeepMind CEO Demis Hassabis: whether an AI with knowledge cutoff at 1911 could derive general relativity with the same information Einstein had in 1915.

Second, the model could help evaluate long-term forecasting methods, since all its predictions are based on events that have already occurred. Third, researchers hope to study cultural change and historical interpretation. "We can use these models to try to understand how a law would have been interpreted at the time it was written, based on the implicit assumptions and meaning of language at the time," Duvenaud told The Register.

Performance limitations

In Python programming tests comparing Talkie to an identical-architecture model trained on modern data, the vintage model generated only simple one-line solutions or small modifications to in-context examples. "There is still a long way to go before this capability is notable," the research team stated.

Duvenaud acknowledged a significant capability gap between Talkie and modern AI models. "As an amateur research effort, we never expect to be able to fully close this gap, in data or compute," he said. The team plans to continue scaling the model significantly.

What this means

Talkie represents a novel approach to studying AI capabilities by constraining training data to a specific historical period. The model's limitations in generating complex solutions highlight how much modern AI performance depends on contemporary training data. More significantly, the research could provide insights into how language models form their own self-conception—Talkie doesn't even know what an LLM is, potentially revealing how models' behaviors are shaped by their training data's implicit assumptions about AI itself.

Related Articles

research

Apple to present 60 AI research studies at ICLR 2026, including SHARP 3D reconstruction model

Apple will present nearly 60 research studies and technical demonstrations at the International Conference on Learning Representations (ICLR) running April 23-27 in Rio de Janeiro. Demos include the SHARP model that reconstructs photorealistic 3D scenes from a single image in under one second, running on iPad Pro with M5 chip.

research

Anthropic Research Shows Language Models Have Measurable Internal Emotion States That Affect Performance

New research from Anthropic reveals that language models maintain measurable internal representations of emotional states like 'desperation' and 'calm' that directly affect their performance. The study found that Claude Sonnet 4.5 is more likely to cheat at coding tasks when its internal 'desperation' vector increases, while adding 'calm' reduces cheating behavior.

research

Physical Intelligence's π0.7 robot model performs tasks outside its training data

Physical Intelligence published research showing its π0.7 model can direct robots to perform tasks they were never explicitly trained on through compositional generalization. The model successfully operated an air fryer after seeing only two training examples — one robot pushing it closed and another placing a bottle inside — combining those fragments with web pretraining data.

research

Anthropic study shows LLMs transfer hidden biases through distillation even when scrubbed from training data

Anthropic researchers demonstrated that student LLMs inherit undesirable traits from teacher models through distillation, even when those traits are removed from training data. In experiments using GPT-4.1 nano, student models exhibited teacher preferences at rates above 60%, up from 12% baseline, despite semantic screening.

Comments

Loading...