论文标题
双logistic回归方法,用于偏见的正标数据
Double logistic regression approach to biased positive-unlabeled data
论文作者
论文摘要
积极和未标记的学习是一个重要的问题,在许多应用中自然出现。几乎所有现有方法的显着局限性在于假设倾向得分函数是恒定的(疤痕假设),这在许多实际情况下都是不现实的。避免这种假设,我们将参数方法考虑到后验概率和倾向得分功能的关节估计问题。我们表明,在轻度假设下,当两个函数具有相同的参数形式(例如,具有不同参数的逻辑)时,相应的参数是可识别的。在此激励的情况下,我们提出了两种估计方法:关节最大似然方法和第二种方法基于两个Fisher一致表达式的交替化。我们的实验结果表明,所提出的方法比基于预期最大化方案的现有方法可比性或更好。
Positive and unlabelled learning is an important problem which arises naturally in many applications. The significant limitation of almost all existing methods lies in assuming that the propensity score function is constant (SCAR assumption), which is unrealistic in many practical situations. Avoiding this assumption, we consider parametric approach to the problem of joint estimation of posterior probability and propensity score functions. We show that under mild assumptions when both functions have the same parametric form (e.g. logistic with different parameters) the corresponding parameters are identifiable. Motivated by this, we propose two approaches to their estimation: joint maximum likelihood method and the second approach based on alternating maximization of two Fisher consistent expressions. Our experimental results show that the proposed methods are comparable or better than the existing methods based on Expectation-Maximisation scheme.