论文标题
部分可观测时空混沌系统的无模型预测
Dynamic Rectification Knowledge Distillation
论文作者
论文摘要
知识蒸馏是一种技术,旨在利用黑暗知识将信息从庞大的,训练有素的神经网络(教师模型)压缩到具有提高推理效率的较小,功能较低的神经网络(学生模型)。由于这种繁琐的模型在边缘计算设备上部署的模型过于复杂的性质,这种提炼知识的方法已获得了流行。通常,用于教授较小学生模型的教师模型本质上很麻烦,训练昂贵。为了完全消除繁琐的教师模型的必要性,我们提出了一个简单而有效的知识蒸馏框架,我们称为动态纠正知识蒸馏(DR-KD)。我们的方法将学生转变为自己的老师,如果自学老师在提炼信息时做出了错误的预测,则在知识蒸馏之前会纠正该错误。具体而言,教师目标是由地面真相的机构动态调整的,同时提炼了从传统培训中获得的知识。在没有复杂的繁琐的教师模型的情况下,我们提议的DR-KD表现出色,并且在由低成本动态的教师实施时,与现有最新的无教师知识蒸馏框架的表现相当。我们的方法是无所不包的,可以用于任何需要分类或对象识别的深神经网络培训。与突出的基线模型相比,DR-KD将Tiny Imagenet上的测试准确性提高了2.65%,这比任何其他知识蒸馏方法都要好得多,同时不需要额外的培训成本。
Knowledge Distillation is a technique which aims to utilize dark knowledge to compress and transfer information from a vast, well-trained neural network (teacher model) to a smaller, less capable neural network (student model) with improved inference efficiency. This approach of distilling knowledge has gained popularity as a result of the prohibitively complicated nature of such cumbersome models for deployment on edge computing devices. Generally, the teacher models used to teach smaller student models are cumbersome in nature and expensive to train. To eliminate the necessity for a cumbersome teacher model completely, we propose a simple yet effective knowledge distillation framework that we termed Dynamic Rectification Knowledge Distillation (DR-KD). Our method transforms the student into its own teacher, and if the self-teacher makes wrong predictions while distilling information, the error is rectified prior to the knowledge being distilled. Specifically, the teacher targets are dynamically tweaked by the agency of ground-truth while distilling the knowledge gained from traditional training. Our proposed DR-KD performs remarkably well in the absence of a sophisticated cumbersome teacher model and achieves comparable performance to existing state-of-the-art teacher-free knowledge distillation frameworks when implemented by a low-cost dynamic mannered teacher. Our approach is all-encompassing and can be utilized for any deep neural network training that requires categorization or object recognition. DR-KD enhances the test accuracy on Tiny ImageNet by 2.65% over prominent baseline models, which is significantly better than any other knowledge distillation approach while requiring no additional training costs.