论文标题
一个外部稳定审计框架,以测试AI招聘中人格预测的有效性
An External Stability Audit Framework to Test the Validity of Personality Prediction in AI Hiring
论文作者
论文摘要
自动化招聘系统是所有高风险AI系统中最快的开发系统之一。其中包括使用心理测试中的见解,并承诺根据求职者的简历或社交媒体概况表明未来成功的人格特征。我们使用其产生的输出的稳定性来询问此类系统的有效性,并指出可靠性是必要的,但不是足够的有效性条件。我们的方法是(a)开发一种方法,以对算法人格测试进行的预测稳定性进行外部审核,以及(b)在对两个系统的审核中实例化这种方法,即Humantic AI和Crystal。至关重要的是,与其挑战或确认心理测试中的假设 - 该个性是一种有意义的和可衡量的结构,并且人格特质表明了工作中未来的成功 - 我们围绕测试算法人格测试的供应商所做的基本假设来构建方法论。 我们的主要贡献是开发一个社会技术框架,用于审核算法系统的稳定性。此贡献补充了一个开源软件库,该软件库实现了审计的技术组件,可用于对算法系统进行类似的稳定性审核。我们通过对两个现实世界个性预测系统的审核(即Humantic AI和Crystal)进行实例化框架。我们的审计框架的应用表明,这两个系统在测量的关键方面都表现出很大的不稳定,因此不能被视为有效的测试工具。
Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Our approach is to (a) develop a methodology for an external audit of stability of predictions made by algorithmic personality tests, and (b) instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Crucially, rather than challenging or affirming the assumptions made in psychometric testing -- that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job -- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. Our main contribution is the development of a socio-technical framework for auditing the stability of algorithmic systems. This contribution is supplemented with an open-source software library that implements the technical components of the audit, and can be used to conduct similar stability audits of algorithmic systems. We instantiate our framework with the audit of two real-world personality prediction systems, namely Humantic AI and Crystal. The application of our audit framework demonstrates that both these systems show substantial instability with respect to key facets of measurement, and hence cannot be considered valid testing instruments.