论文标题
分类探索性数据分析:从多类分类和响应歧管分析的棒球投球动力学观点
Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics
论文作者
论文摘要
从两个耦合的多类分类(MCC)和响应歧管分析(RMA)观点,我们在Pitchf/X数据库上开发了分类探索性数据分析(CEDA),以提供美国职业棒球大联盟(MLB)投球动态的信息内容。 MCC和RMA信息内容由一个多尺度模式类别表示,分别从响应综合歧管中分别混合几何形式和一个全球到本地几何位置。这些集体阐明了投球动力学,并绘制出流行机器学习方法的不确定性。在MCC设置上,基于间接距离的标签嵌入树的标签会导致发现标签点云之间混合几何形状的不对称性。选择的互补协变量特征群共同带出了多阶混合几何模式类别。然后,此类类别揭示了MCC预测推断的真实本质。在RMA设置上,多个响应特征夫妇具有多个主要协变量特征,以证明具有自然位置晶格的物理原理。由于次要特征的异质效应被局部鉴定出来,因此这些地区将其焦点特征共同编织到系统理解中,并为RMA预测推断提供了平台。我们的CEDA专用于通用数据类型,采用非线性关联并促进有效的功能选择和推论。
From two coupled Multiclass Classification (MCC) and Response Manifold Analytics (RMA) perspectives, we develop Categorical Exploratory Data Analysis (CEDA) on PITCHf/x database for the information content of Major League Baseball's (MLB) pitching dynamics. MCC and RMA information contents are represented by one collection of multi-scales pattern categories from mixing geometries and one collection of global-to-local geometric localities from response-covariate manifolds, respectively. These collectives shed light on the pitching dynamics and maps out uncertainty of popular machine learning approaches. On MCC setting, an indirect-distance-measure based label embedding tree leads to discover asymmetry of mixing geometries among labels' point-clouds. A selected chain of complementary covariate feature groups collectively brings out multi-order mixing geometric pattern categories. Such categories then reveal the true nature of MCC predictive inferences. On RMA setting, multiple response features couple with multiple major covariate features to demonstrate physical principles bearing manifolds with a lattice of natural localities. With minor features' heterogeneous effects being locally identified, such localities jointly weave their focal characteristics into system understanding and provide a platform for RMA predictive inferences. Our CEDA works for universal data types, adopts non-linear associations and facilitates efficient feature-selections and inferences.