David Heineman

Hey! I'm David ๐Ÿ‘‹

I'm a pre-doctoral young investigator at the Allen Institute for AI, working to improve language model pre-training and evaluation.


Research interests

Building language models can, and should, be a rigorous science: I believe our fieldโ€™s biggest bottleneck in doing so is the quality of our experimentation methodology [1] and the power of our evaluation signal [2]. This requires better interpretations of our existing measures of capability [3], new tools for observing how language models express behavior [4], and connecting tasks that are meaningful to our ability to learn, and generate, language [5, 6].

I work on these problems at Ai2 as part of the Open Language Model (OLMo) project, advised by Kyle Lo and Jesse Dodge. Previously, I completed my undergrad at Georgia Tech ๐Ÿ, where I was fortunate to be advised by Prof. Wei Xu and work with Yao Dou and Mounica Maddela. I've also spent a few summers as an intern at AWS and at a healthcare startup Patientco. I enjoy reading, hiking, and making homebrew nitrogen cold brew. โ˜•๏ธ โ›ฐ๏ธ


Publications & Preprints โœ’๏ธ

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation [code, data]

David Heineman, Valentin Hofmann, Ian Magnusson, Yuling Gu, Noah A. Smith, Hannaneh Hajishirzi, Kyle Lo, Jesse Dodge
preprint, 2025

2 OLMo 2 Furious [code, models, data]

Pete Walsh*, Luca Soldaini*, Dirk Groeneveld*, Kyle Lo*, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, ..., David Heineman, ..., Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi
COLM, 2025

Establishing Task Scaling Laws via Compute-Efficient Model Ladders [code]

Akshita Bhagia*, Jiacheng Liu*, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
COLM, 2025

Evaluating LLMs on Chinese Idiom Translation

Cai Yang, Yao Dou, David Heineman, Xiaofeng Wu, Wei Xu
COLM, 2025

DataDecide: How to Predict Best Pretraining Data with Small Experiments [code, models]

Ian Magnusson*, Nguyen Tai*, Ben Bogin*, David Heineman, Jena D. Hwang, Luca Soldaini, Akshita Bhagia, Jiacheng Liu, Dirk Groeneveld, Oyvind Tafjord, Noah A. Smith, Pang Wei Koh, Jesse Dodge
ICML, 2025

Improving Minimum Bayes Risk Decoding with Multi-Prompt [code]

David Heineman, Yao Dou, Wei Xu
EMNLP, 2024

Towards a Path Dependent Account of Category Fluency [code]

David Heineman, Reba Koenen, Sashank Varma
CogSci, 2024

Thresh: Unified, Customizable and Deployable Fine-Grained Text Evaluation [live tool]

David Heineman, Yao Dou, Wei Xu
EMNLP Demo, 2023

Edit-level Simplification Evaluation using SALSA ๐Ÿ’ƒ [code/data, metric]

David Heineman, Yao Dou, Mounica Maddela, Wei Xu
EMNLP, 2023

LENS: A Learnable Evaluation Metric for Text Simplification [code/data, metric]

Mounica Maddela*, Yao Dou*, David Heineman, Wei Xu
ACL, 2023

* = equal contribution


My past work ๐ŸŒณ

Recommendations

A few interesting corners of the internet I find worth checking out!

... to flip through


Games, Puzzles, and Computation by Erik Demaine

The Corrections by Jonathan Franzen

Society Must be Defended by Michel Foucault

Oblivion by David Foster Wallace


I also enjoy trying new coffee shops. Here's some recommendations across Atlanta, that I visited during my undergrad, and a growing list across Seattle.