论文标题

贝叶斯多级隐藏马尔可夫模型的样本尺寸注意事项:基于睡眠数据的高度重叠组件分布的多变量连续数据的仿真研究

Sample Size Considerations for Bayesian Multilevel Hidden Markov Models: A Simulation Study on Multivariate Continuous Data with highly overlapping Component Distributions based on Sleep Data

论文作者

Ginn, Jasper, Moraga, Sebastian Mildiner, Aarts, Emmeke

论文摘要

在某种程度上,传感器数量不断增长和收集数据的方法,使用密集的纵向数据(ILD)在社会和行为科学中变得越来越普遍。通常认为在该领域收集的ILD是潜在状态(例如行为,情感)的结果,而ILD的承诺在于它在及时展开时捕获这些状态的动态的能力。特别是,通过收集多个受试者的数据,研究人员可以观察到这种动态在受试者之间有何不同。贝叶斯多级隐藏的马尔可夫模型(MHMM)是一个相对新颖的模型,可用于在受试者之间考虑异质性的同时对这种ILD进行建模。虽然MHMM已应用于各种设置,但缺少该模型所需的样本量的大规模研究。在本文中,我们通过进行仿真研究来解决这一研究差距,以评估改变受试者的数量,(2)场合数量,以及(3)受试者之间的变异性对MHMM获得的参数估计值的可变性。我们在睡眠研究的背景下将这项仿真研究构图,该研究由多元连续数据组成,该数据在州依赖性组件分布中显示出相当大的重叠。此外,我们生成了具有更多一般数据属性的一组基线场景。总体而言,受试者的数量对模型性能具有最大的影响。但是,场合的数量对于充分建模潜在状态过渡很重要。我们讨论数据的特征如何影响参数估计,并向寻求将MHMM应用于自己数据的研究人员提供建议。

Spurred in part by the ever-growing number of sensors and web-based methods of collecting data, the use of Intensive Longitudinal Data (ILD) is becoming more common in the social and behavioural sciences. The ILD collected in this field are often hypothesised to be the result of latent states (e.g. behaviour, emotions), and the promise of ILD lies in its ability to capture the dynamics of these states as they unfold in time. In particular, by collecting data for multiple subjects, researchers can observe how such dynamics differ between subjects. The Bayesian Multilevel Hidden Markov Model (mHMM) is a relatively novel model that is suited to model the ILD of this kind while taking into account heterogeneity between subjects. While the mHMM has been applied in a variety of settings, large-scale studies that examine the required sample size for this model are lacking. In this paper, we address this research gap by conducting a simulation study to evaluate the effect of changing (1) the number of subjects, (2) the number of occasions, and (3) the between subjects variability on parameter estimates obtained by the mHMM. We frame this simulation study in the context of sleep research, which consists of multivariate continuous data that displays considerable overlap in the state dependent component distributions. In addition, we generate a set of baseline scenarios with more general data properties. Overall, the number of subjects has the largest effect on model performance. However, the number of occasions is important to adequately model latent state transitions. We discuss how the characteristics of the data influence parameter estimation and provide recommendations to researchers seeking to apply the mHMM to their own data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源