在基于变压器的文本一代中调查非裔美国人的白话英语

论文标题

在基于变压器的文本一代中调查非裔美国人的白话英语

Investigating African-American Vernacular English in Transformer-Based Text Generation

论文作者

Groenwold, Sophie, Ou, Lily, Parekh, Aesha, Honnavalli, Samhita, Levy, Sharon, Mirza, Diba, Wang, William Yang

论文摘要

社交媒体的增长鼓励了非洲裔美国白话英语（AAVE）的书面使用，传统上仅在口头环境中使用。但是，由于文本语料库的可用性，NLP模型历史上是使用主要的英语品种（例如美国英语（SAE））开发的。我们通过创建与平行的平行AAVE/SAE Tweet对的数据集来研究GPT-2在AAVE文本上的性能，从而为每对隔离句法结构和AAVE或SAE特定语言。我们评估了每个样本及其GPT-2生成的文本，并使用预验证的情感分类器评估，发现AAVE文本比SAE产生了更多的负面情感分类，但GPT-2的使用通常会增加两者的积极情绪。此外，我们对使用GPT-2产生的AAVE和SAE文本进行人体评估，以比较上下文严格和整体质量。

The growth of social media has encouraged the written use of African American Vernacular English (AAVE), which has traditionally been used only in oral contexts. However, NLP models have historically been developed using dominant English varieties, such as Standard American English (SAE), due to text corpora availability. We investigate the performance of GPT-2 on AAVE text by creating a dataset of intent-equivalent parallel AAVE/SAE tweet pairs, thereby isolating syntactic structure and AAVE- or SAE-specific language for each pair. We evaluate each sample and its GPT-2 generated text with pretrained sentiment classifiers and find that while AAVE text results in more classifications of negative sentiment than SAE, the use of GPT-2 generally increases occurrences of positive sentiment for both. Additionally, we conduct human evaluation of AAVE and SAE text generated with GPT-2 to compare contextual rigor and overall quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题