通过分区储层抽样不平衡持续学习

论文标题

通过分区储层抽样不平衡持续学习

Imbalanced Continual Learning with Partitioning Reservoir Sampling

论文作者

Kim, Chris Dongjoo, Jeong, Jinseo, Kim, Gunhee

论文摘要

从一系列数据流中持续学习是机器学习研究的关键挑战。大多数研究都是在单标签分类设置下对该主题进行的，以及平衡标签分布的假设。这项工作将这一研究视野扩展到了多标签分类。在这样做的过程中，我们在许多多标签数据集（长尾分布）中天生就确定了意外存在的逆境。我们共同解决了两个独立解决问题，灾难性遗忘和长尾标签的分布，首先通过经验表明破坏性忘记尾巴上的少数概念的新挑战。然后，我们策划了两个基准数据集Cocoseq和Nus-Wideseq，这些数据集允许研究内部和任务跨性别的失衡。最后，我们提出了一种新的采样策略，用于基于重播的方法，名为“分区”储层采样（PRS），该方法使模型可以保持对头部和尾部类别的平衡知识。我们在项目页面中公开发布数据集和代码。

Continual learning from a sequential stream of data is a crucial challenge for machine learning research. Most studies have been conducted on this topic under the single-label classification setting along with an assumption of balanced label distribution. This work expands this research horizon towards multi-label classification. In doing so, we identify unanticipated adversity innately existent in many multi-label datasets, the long-tailed distribution. We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by first empirically showing a new challenge of destructive forgetting of the minority concepts on the tail. Then, we curate two benchmark datasets, COCOseq and NUS-WIDEseq, that allow the study of both intra- and inter-task imbalances. Lastly, we propose a new sampling strategy for replay-based approach named Partitioning Reservoir Sampling (PRS), which allows the model to maintain a balanced knowledge of both head and tail classes. We publicly release the dataset and the code in our project page.

下载PDF全文

下载文献需遵守相关版权规定

论文标题