Name
Alexander Volfovsky, "New Experimental Designs with Text as Treatment: Can a Large Language Model be Too Large?"
Date & Time
Thursday, October 16, 2025, 2:00 PM - 2:25 PM
Speakers
Alexander Volfovsky, Duke University "New Experimental Designs With Text As Treatment: Can A Large Language Model Be Too Large?" Given two texts, we may ask which one is more perusasive. Such a comparison only tells us about these two texts and does not tell us what elements of the text drive the causal mechanism. Since the mechanism is of interest, a tempting design is to show many texts, measure their effects, and use natural language processing to learn what features of the texts should be considered as components of a causal analysis. However, such a black box approach (e.g. a large language model) provides insufficient control of the causal model and may lead to spurious or nonsensical results.
We develop a novel experimental design for text as treatment that controls which elements of text are being studied and admits simple estimators. We study the effect of intellectually humble language on the persuasiveness of text. Using our data we demonstrate two major issues with existing machine learning tools for inferring causal effects of text. Transformer models that use learned representations of text as confounders overfit the data, inducing positivity violations. Other tools that try and correct for text indirectly underfit the data and act like estimators that never even looked at text confounders. Alexander Volfovsky is an Associate Professor of Statistical Science at Duke University, where he also serves as co-director of the Polarization Lab. His research lies at the intersection of causal inference, network analysis, and machine learning, with applications to understanding social behavior, online interactions, and decision-making in complex systems. He develops novel statistical methodologies for estimating causal effects in the presence of interference, modeling relational data, and designing adaptive experiments. Volfovsky’s work often bridges methodological innovation with large-scale empirical studies, integrating tools from Bayesian statistics, randomized experiments, and computational social science. His recent projects focus on human–AI interaction, trust calibration, and the design of artificial agents that foster constructive discourse. His research has been supported by the National Science Foundation, the Department of Defense, and the Templeton Foundation, among others.

Description
Given two texts, we may ask which one is more perusasive. Such a comparison only tells us about these two texts and does not tell us what elements of the text drive the causal mechanism. Since the mechanism is of interest, a tempting design is to show many texts, measure their effects, and use natural language processing to learn what features of the texts should be considered as components of a causal analysis. However, such a black box approach (e.g. a large language model) provides insufficient control of the causal model and may lead to spurious or nonsensical results.
We develop a novel experimental design for text as treatment that controls which elements of text are being studied and admits simple estimators. We study the effect of intellectually humble language on the persuasiveness of text. Using our data we demonstrate two major issues with existing machine learning tools for inferring causal effects of text. Transformer models that use learned representations of text as confounders overfit the data, inducing positivity violations. Other tools that try and correct for text indirectly underfit the data and act like estimators that never even looked at text confounders.
Location Name
Kline Tower 14th Floor
Full Address
Kline Tower
219 Prospect St, 14th Floor
New Haven, CT 06511
United States
Session Type
Lecture
Title
New Experimental Designs with Text as Treatment: Can a Large Language Model be Too Large?
Abstract
Given two texts, we may ask which one is more perusasive. Such a comparison only tells us about these two texts and does not tell us what elements of the text drive the causal mechanism. Since the mechanism is of interest, a tempting design is to show many texts, measure their effects, and use natural language processing to learn what features of the texts should be considered as components of a causal analysis. However, such a black box approach (e.g. a large language model) provides insufficient control of the causal model and may lead to spurious or nonsensical results.
We develop a novel experimental design for text as treatment that controls which elements of text are being studied and admits simple estimators. We study the effect of intellectually humble language on the persuasiveness of text. Using our data we demonstrate two major issues with existing machine learning tools for inferring causal effects of text. Transformer models that use learned representations of text as confounders overfit the data, inducing positivity violations. Other tools that try and correct for text indirectly underfit the data and act like estimators that never even looked at text confounders.