论文标题
学习扩散模型的限制
The Limits to Learning a Diffusion Model
论文作者
论文摘要
本文为简单扩散模型的估计提供了第一个样本复杂性较低的界限,包括低音模型(用于建模消费者采用)和SIR模型(用于建模流行病)。我们表明,直到扩散后很晚才希望学习这种模型。具体而言,我们表明,收集超过样本复杂性下限的许多观察结果所需的时间很大。对于具有较低创新率的低音模型,我们的结果表明,直到至少有三分之二的新采用者达到顶峰的时间,人们就无法预测最终的收养客户数量。同样,我们的结果表明,在SIR模型的情况下,人们不能希望预测最终的感染次数,直到感染率达到峰值的时间约为三分之二的三分之二。估计中的这种下限进一步转化为对流行病干预决策的后悔。我们的结果正式提出了准确预测的挑战,并强调了合并其他数据源的重要性。为此,我们在流行病中分析了血清阳性研究的益处,在该研究中,我们表征了改善SIR模型估计所需的研究规模。关于产品采用和流行数据的广泛经验分析支持我们的理论发现。
This paper provides the first sample complexity lower bounds for the estimation of simple diffusion models, including the Bass model (used in modeling consumer adoption) and the SIR model (used in modeling epidemics). We show that one cannot hope to learn such models until quite late in the diffusion. Specifically, we show that the time required to collect a number of observations that exceeds our sample complexity lower bounds is large. For Bass models with low innovation rates, our results imply that one cannot hope to predict the eventual number of adopting customers until one is at least two-thirds of the way to the time at which the rate of new adopters is at its peak. In a similar vein, our results imply that in the case of an SIR model, one cannot hope to predict the eventual number of infections until one is approximately two-thirds of the way to the time at which the infection rate has peaked. This lower bound in estimation further translates into a lower bound in regret for decision-making in epidemic interventions. Our results formalize the challenge of accurate forecasting and highlight the importance of incorporating additional data sources. To this end, we analyze the benefit of a seroprevalence study in an epidemic, where we characterize the size of the study needed to improve SIR model estimation. Extensive empirical analyses on product adoption and epidemic data support our theoretical findings.