在线服务中总结非结构化日志

论文标题

在线服务中总结非结构化日志

Summarizing Unstructured Logs in Online Services

论文作者

Meng, Weibin, Zaiter, Federico, Huang, Yuheng, Liu, Ying, Zhang, Shenglin, Zhang, Yuzhe, Zhu, Yichen, Zhang, Tianke, Wang, En, Ren, Zuomin, Wang, Feng, Tao, Shimin, Pei, Dan

论文摘要

日志是管理大规模在线服务的最有价值的数据源之一。在检测到失败/诊断/预测故障后，操作员仍必须检查原始日志以在采取措施之前获得汇总视图。但是，手动或基于规则的日志汇总已变得效率低下且无效。在这项工作中，我们提出了logsummary，这是一个自动，无监督的端到端日志摘要框架，用于在线服务。 logSummary为给定的日志序列获得了重要日志的汇总三元组。它整合了一种新的信息提取方法，同时考虑了语义信息和域知识，并使用从所有日志中学到的全局知识进行了新的三重排名方法。鉴于缺少公开可用的日志总结金标准，我们已经手动标记了四个开源日志数据集的摘要，并使其公开可用。这些数据集的评估以及对现实世界日志的案例研究表明，logsummary产生了高度代表性的（平均胭脂F1得分为0.741）。我们已经将logsummary打包到了开源工具包中，并希望它能为未来的NLP供电摘要工作而受益。

Logs are one of the most valuable data sources for managing large-scale online services. After a failure is detected/diagnosed/predicted, operators still have to inspect the raw logs to gain a summarized view before take actions. However, manual or rule-based log summarization has become inefficient and ineffective. In this work, we propose LogSummary, an automatic, unsupervised end-to-end log summarization framework for online services. LogSummary obtains the summarized triples of important logs for a given log sequence. It integrates a novel information extraction method taking both semantic information and domain knowledge into consideration, with a new triple ranking approach using the global knowledge learned from all logs. Given the lack of a publicly-available gold standard for log summarization, we have manually labelled the summaries of four open-source log datasets and made them publicly available. The evaluation on these datasets as well as the case studies on real-world logs demonstrate that LogSummary produces a highly representative (average ROUGE F1 score of 0.741) summaries. We have packaged LogSummary into an open-source toolkit and hope that it can benefit for future NLP-powered summarization works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题