论文标题
A3:数据中心网络中的自动拓扑感知故障检测和固定系统
A3: An Automatic Topology-Aware Malfunction Detection and Fixation System in Data Center Networks
论文作者
论文摘要
链接故障和电缆失路在构建数据中心网络中并不少见,这可以防止现有的自动地址配置方法正确运行。但是,准确地检测此类故障并不是一件容易的事,因为不可能观察到的节点度更改。固定或纠正此类故障甚至更困难,因为几乎没有工作可以提供准确的固定建议。 为了解决问题,我们设计和实施A3,这是一种自动拓扑感知的故障检测和固定系统。 A3创新地提出了针对计算最小图差异(NP-HARD)问题的最小固定问题的问题,并将其求解在O(k^6)和O(k^3)中,分别小于K/2和K/4 Fattree的k/2和K/4无方向的链接故障。我们的评估表明,对于小于K/2无向链路故障,A3对于故障检测的精度为100%,并提供最小的固定结果。对于更大或等于K/2无向链路故障,A3的精度仍然约为100%,并提供了几乎最佳的固定结果。
Link failures and cable miswirings are not uncommon in building data center networks, which prevents the existing automatic address configuration methods from functioning correctly. However, accurately detecting such malfunctions is not an easy task because there could be no observable node degree changes. Fixing or correcting such malfunctions is even harder as almost no work can provide accurate fixation suggestions now. To solve the problems, we design and implement A3, an automatic topology-aware malfunction detection and fixation system. A3 innovatively formulates the problem of finding minimal fixation to the problem of computing minimum graph difference (NP-hard) and solves it in O(k^6) and O(k^3) for any less than k/2 and k/4 undirected link malfunctions for FatTree, respectively. Our evaluation demonstrates that for less than k/2 undirected link malfunctions, A3 is 100% accurate for malfunction detection and provides the minimum fixation result. For greater or equal to k/2 undirected link malfunctions, A3 still has accuracy of about 100% and provides the near optimal fixation result.