论文标题
Chipower转换:组成数据分析中汇总转换的有效替代方案
The chiPower transformation: a valid alternative to logratio transformations in compositional data analysis
论文作者
论文摘要
分析组成数据的方法已由logratio变换的使用来支配,以确保确切的亚位值相干性,并且在某些情况下也精确的等距。这种方法的一个问题是,在大多数应用程序中发现的数据零必须更换以允许对数转换。允许数据零的一种替代新方法,称为“ chipower”转换,是将卡方距离中固有的标准化结合在对应分析中,以及盒子cox功率转换的基本元素。 Chipower转换是合理的,因为它定义了样本间距离,该样本距离倾向于严格的阳性数据,因为功率参数趋于零,然后等同于转换为logratios。对于带有零的数据,可以确定一个功率值,该值将chipower转换尽可能接近汇总转换,而无需替代零。特别是在高维数据领域,这种替代方法可以提出如此高的连贯性和等轴测图,以作为分析组成数据的有效方法。此外,在有监督的学习环境中,如果组成变量是建模框架中响应的预测指标,例如广义线性模型,则可以将功率用作调谐参数,以通过跨视频化优化预测的准确性。芯片转换的变量具有直接的解释,因为它们每个都有单个组成部分,而不是比率。
The approach to analysing compositional data has been dominated by the use of logratio transformations, to ensure exact subcompositional coherence and, in some situations, exact isometry as well. A problem with this approach is that data zeros, found in most applications, have to be replaced to allow the logarithmic transformation. An alternative new approach, called the `chiPower' transformation, which allows data zeros, is to combine the standardization inherent in the chi-square distance in correspondence analysis, with the essential elements of the Box-Cox power transformation. The chiPower transformation is justified because it} defines between-sample distances that tend to logratio distances for strictly positive data as the power parameter tends to zero, and are then equivalent to transforming to logratios. For data with zeros, a value of the power can be identified that brings the chiPower transformation as close as possible to a logratio transformation, without having to substitute the zeros. Especially in the area of high-dimensional data, this alternative approach can present such a high level of coherence and isometry as to be a valid approach to the analysis of compositional data. Furthermore, in a supervised learning context, if the compositional variables serve as predictors of a response in a modelling framework, for example generalized linear models, then the power can be used as a tuning parameter in optimizing the accuracy of prediction through cross-validation. The chiPower-transformed variables have a straightforward interpretation, since they are each identified with single compositional parts, not ratios.