互联网审查制度的语言指纹：Sinaweibo的情况

论文标题

互联网审查制度的语言指纹：Sinaweibo的情况

Linguistic Fingerprints of Internet Censorship: the Case of SinaWeibo

论文作者

Ng, Kei Yin, Feldman, Anna, Peng, Jing

论文摘要

本文研究了从中国微博平台Sina Weibo收集的博客文章的语言组成部分如何影响Blogposts的可能性。我们的结果与King等人一起进行。（2013年）的集体行动潜力（CAP）理论，该理论指出，博客文章在现实生活中引起骚乱或集会的潜力是其受到审查的关键决定因素。尽管对这种结构没有明确的衡量标准，但我们识别为歧视性的语言特征与CAP理论有关。我们构建一个分类器，在预测博客文章是否会受到审查方面极大地胜过非专家的人。众包结果表明，尽管人类倾向于将受到审查的博客文章视为有争议的，并且比未经审查的同行更有可能触发现实生活中的行动，但通常，他们在确定是否应审计博客概述时，在审查员确定是否应该审查博客时，他们不能比我们的模型更好地猜测。我们没有声称审查制度仅由语言特征决定。还有许多其他因素导致审查决定。本文的重点是博客文章的语言形式。我们的工作表明，可以使用社交媒体帖子的语言特性自动预测它们是否将受到审查。

This paper studies how the linguistic components of blogposts collected from Sina Weibo, a Chinese microblogging platform, might affect the blogposts' likelihood of being censored. Our results go along with King et al. (2013)'s Collective Action Potential (CAP) theory, which states that a blogpost's potential of causing riot or assembly in real life is the key determinant of it getting censored. Although there is not a definitive measure of this construct, the linguistic features that we identify as discriminatory go along with the CAP theory. We build a classifier that significantly outperforms non-expert humans in predicting whether a blogpost will be censored. The crowdsourcing results suggest that while humans tend to see censored blogposts as more controversial and more likely to trigger action in real life than the uncensored counterparts, they in general cannot make a better guess than our model when it comes to `reading the mind' of the censors in deciding whether a blogpost should be censored. We do not claim that censorship is only determined by the linguistic features. There are many other factors contributing to censorship decisions. The focus of the present paper is on the linguistic form of blogposts. Our work suggests that it is possible to use linguistic properties of social media posts to automatically predict if they are going to be censored.

下载PDF全文

下载文献需遵守相关版权规定

论文标题