论文标题
用汽车系统评估探索性模型建设过程
Towards Evaluating Exploratory Model Building Process with AutoML Systems
论文作者
论文摘要
自动化机器学习(AUTOML)系统的使用是高度开放式和探索性的。虽然严格评估最终用户与汽车的相互作用是至关重要的,但为这种探索系统建立强大的评估方法是具有挑战性的。首先,AutoML很复杂,包括支持多种子任务的多个子组件,用于合成ML管道,例如数据制备,问题规范和模型生成,因此很难产生见解,以告诉我们哪些组件成功或不成功。其次,由于汽车的使用模式是高度探索性的,因此不可能仅依靠广泛使用的任务效率和有效性指标作为成功指标。为了应对评估的挑战,我们提出了一种评估方法,(1)指导汽车构建者将其自动系统分为多个子系统组件,(2)通过可视化最终用户的行为模式和态度数据来帮助他们推理每个组件。我们进行了一项研究,以了解何时,如何,为什么,以及应用我们的方法论可以帮助构建者更好地了解其系统和最终用户。我们招募了3个专业汽车建筑商团队。团队准备了自己的系统,并让41个最终用户使用系统。使用我们的方法,我们可以看到最终用户的行为和态度数据,并将结果分配给团队。我们分析了两个方向的结果:汽车构建者从最终用户中学到的新颖见解,以及(2)评估方法如何帮助构建者了解工作流程及其系统的有效性。我们的发现提出了新的见解,解释了汽车领域中未来的设计机会,以及使用我们的方法论如何帮助建筑商确定洞察力,并让他们绘制具体的方向以改善其系统。
The use of Automated Machine Learning (AutoML) systems are highly open-ended and exploratory. While rigorously evaluating how end-users interact with AutoML is crucial, establishing a robust evaluation methodology for such exploratory systems is challenging. First, AutoML is complex, including multiple sub-components that support a variety of sub-tasks for synthesizing ML pipelines, such as data preparation, problem specification, and model generation, making it difficult to yield insights that tell us which components were successful or not. Second, because the usage pattern of AutoML is highly exploratory, it is not possible to rely solely on widely used task efficiency and effectiveness metrics as success metrics. To tackle the challenges in evaluation, we propose an evaluation methodology that (1) guides AutoML builders to divide their AutoML system into multiple sub-system components, and (2) helps them reason about each component through visualization of end-users' behavioral patterns and attitudinal data. We conducted a study to understand when, how, why, and applying our methodology can help builders to better understand their systems and end-users. We recruited 3 teams of professional AutoML builders. The teams prepared their own systems and let 41 end-users use the systems. Using our methodology, we visualized end-users' behavioral and attitudinal data and distributed the results to the teams. We analyzed the results in two directions: what types of novel insights the AutoML builders learned from end-users, and (2) how the evaluation methodology helped the builders to understand workflows and the effectiveness of their systems. Our findings suggest new insights explaining future design opportunities in the AutoML domain as well as how using our methodology helped the builders to determine insights and let them draw concrete directions for improving their systems.