论文标题
在线服务中总结非结构化日志
Summarizing Unstructured Logs in Online Services
论文作者
论文摘要
日志是管理大规模在线服务的最有价值的数据源之一。在检测到失败/诊断/预测故障后,操作员仍必须检查原始日志以在采取措施之前获得汇总视图。但是,手动或基于规则的日志汇总已变得效率低下且无效。在这项工作中,我们提出了logsummary,这是一个自动,无监督的端到端日志摘要框架,用于在线服务。 logSummary为给定的日志序列获得了重要日志的汇总三元组。它整合了一种新的信息提取方法,同时考虑了语义信息和域知识,并使用从所有日志中学到的全局知识进行了新的三重排名方法。鉴于缺少公开可用的日志总结金标准,我们已经手动标记了四个开源日志数据集的摘要,并使其公开可用。这些数据集的评估以及对现实世界日志的案例研究表明,logsummary产生了高度代表性的(平均胭脂F1得分为0.741)。我们已经将logsummary打包到了开源工具包中,并希望它能为未来的NLP供电摘要工作而受益。
Logs are one of the most valuable data sources for managing large-scale online services. After a failure is detected/diagnosed/predicted, operators still have to inspect the raw logs to gain a summarized view before take actions. However, manual or rule-based log summarization has become inefficient and ineffective. In this work, we propose LogSummary, an automatic, unsupervised end-to-end log summarization framework for online services. LogSummary obtains the summarized triples of important logs for a given log sequence. It integrates a novel information extraction method taking both semantic information and domain knowledge into consideration, with a new triple ranking approach using the global knowledge learned from all logs. Given the lack of a publicly-available gold standard for log summarization, we have manually labelled the summaries of four open-source log datasets and made them publicly available. The evaluation on these datasets as well as the case studies on real-world logs demonstrate that LogSummary produces a highly representative (average ROUGE F1 score of 0.741) summaries. We have packaged LogSummary into an open-source toolkit and hope that it can benefit for future NLP-powered summarization works.