基于混合算法的鲁棒大数据聚类，用于解决不健康的初始化，动态质心选择和分析空的聚类问题

论文标题

基于混合算法的鲁棒大数据聚类，用于解决不健康的初始化，动态质心选择和分析空的聚类问题

A Hybrid Algorithm Based Robust Big Data Clustering for Solving Unhealthy Initialization, Dynamic Centroid Selection and Empty clustering Problems with Analysis

论文作者

Joarder, Y. A., Ahmed, Mosabbir

论文摘要

大数据是大量的结构化和非结构化数据，它们太大，也很难使用传统技术进行处理。聚类算法已开发为一种强大的学习工具，可以精确地分析现代应用程序产生的数据量。数据挖掘中的聚类是基于其特征的特定对象集进行分组。聚类的主要目的是将数据分类为群集，以便对对象按照相似性和功能进行对应时将它们分组为相同的群集。到目前为止，K均值是在广泛的区域中连接的最佳使用计算，以识别聚集分离比聚集分离大得多的聚会。我们开发的算法与K-均值合作，可在大数据聚类中用于高质量的聚类。我们提出的算法，例如K-均值：扩展一代K-均值主要解决了K-均值的三个问题：不健康的初始化，动态质心选择和空聚类。它确保了防止不健康的初始化，动态质心选择和空的聚类问题的最佳方法，以获得高质量的聚类。

Big Data is a massive volume of both structured and unstructured data that is too large and it also difficult to process using traditional techniques. Clustering algorithms have developed as a powerful learning tool that can exactly analyze the volume of data that produced by modern applications. Clustering in data mining is the grouping of a particular set of objects based on their characteristics. The main aim of clustering is to classified data into clusters such that objects are grouped in the same clusters when they are corresponding according to similarities and features mainly. Till now, K-MEANS is the best utilized calculation connected in a wide scope of zones to recognize gatherings where cluster separations are a lot than between gathering separations. Our developed algorithm works with K-MEANS for high quality clustering during clustering from big data. Our proposed algorithm EG K-MEANS : Extended Generation K-MEANS solves mainly three issues of K-MEANS: unhealthy initialization, dynamic centroid selection and empty clustering. It ensures the best way of preventing unhealthy initialization, dynamic centroid selection and empty clustering problems for getting high quality clustering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题