聚类分析全解析：从固定数量聚类到实际应用

### 聚类分析全解析：从固定数量聚类到实际应用 #### 1. 固定数量聚类在聚类过程中，有时需要强制生成预先设定数量的聚类。以下通过具体示例来展示这种强制聚类的结果。首先，我们生成两个真正分离的输入云，代码如下： ```mathematica Clear["Global‘*"]; <<CIP‘Cluster‘ <<CIP‘Graphics‘ <<CIP‘CalculatedData‘ standardDeviation = 0.05; numberOfCloudInputs = 500; centroid1 = {0.3, 0.7}; cloudDefinition1 = {centroid1, numberOfCloudInputs, standardDeviation}; inputs1 = CIP‘CalculatedData‘GetDefinedGaussianCloud[cloudDefinition1]; centroid2 = {0.7, 0.3}; cloudDefinition2 = {centroid2, numberOfCloudInputs, standardDeviation}; inputs2 = CIP‘CalculatedData‘GetDefinedGaussianCloud[cloudDefinition2]; inputs = Join[inputs1, inputs2]; labels = {"x", "y", "Inputs to be clustered"}; points2DWithPlotStyle = {inputs, {PointSize[0.01], Blue}}; points2DWithPlotStyleList = {points2DWithPlotStyle}; CIP‘Graphics‘PlotMultiple2dPoints[points2DWithPlotStyleList, labels] ``` 当将这两个最优或自然聚类强制划分为 3 个聚类时： ```mathematica numberOfClusters = 3; clusterInfo = CIP‘Cluster‘GetFixedNumberOfClusters[inputs, numberOfClusters]; CIP‘Cluster‘ShowClusterResult[{"NumberOfClusters", "EuclideanDistanceDiagram", "ClusterStatistics"}, clusterInfo] ``` 结果如下： | 聚类编号 | 成员数量 | 占比 | 距离 | | --- | --- | --- | --- | | 1 | 500 | 50% | 0 | | 2 | 271 | 27.1% | 0.561643 | | 3 | 229 | 22.9% | 0.573776 | 输入被分割成一个大聚类和两个相邻的小聚类，实际上第二个自然聚类被简单地分成了两半。通过轮廓宽度检查，发现一个好的聚类（与第一个自然聚类相同）和两个较差的聚类。如果将输入划分为 4 个聚类： ```mathematica numberOfClusters = 4; clusterInfo = CIP‘Cluster‘GetFixedNumberOfClusters[inputs, numberOfClusters]; CIP‘Cluster‘ShowClusterResult[{"NumberOfClusters", "EuclideanDistanceDiagram", "ClusterStatistics"}, clusterInfo] ``` 结果如下： | 聚类编号 | 成员数量 | 占比 | 距离 | | --- | --- | --- | --- | | 1 | 282 | 28.2% | 0 | | 2 | 218 | 21.8% | 0.0842652 | | 3 | 265 | 26.5% | 0.568283 | | 4 | 235 | 23.5% | 0.587472 | 输入被分割成四个大小相似的小聚类，每个小聚类是两个最优自然聚类的一半，轮廓宽度显示这 4 个聚类都较差。从这些示例可以看出，将输入划分为越来越多的聚类似乎没有太大用处，因为聚类变得越不自然，聚类质量就越低。 #### 2. 获取代表强制固定数量聚类的一个重要应用是生成一组输入的少量代表，这些代表应具有与完整输入集相似的空间多样性。 ##### 2.1 均匀分布输入示例首先，我们使用 5000 个随机分布的输入作为示例： ```mathematica Clear["Global‘*"]; <<CIP‘Graphics‘ <<CIP‘Cluster‘ <<CIP‘CalculatedData‘ SeedRandom[1]; inputs = Table[{RandomReal[{0.05, 0.95}], RandomReal[{0.05, 0.95}]}, {5000}]; argumentRange = {0.0, 1.0}; functionValueRange = {0.0, 1.0}; labels = {"x", "y", "Inputs"}; allInputVectorsWithPlotStyle = {inputs, {PointSize[0.01], Green}}; points2DWithPlotStyleList = {allInputVectorsWithPlotStyle}; CIP‘Graphics‘PlotMultiple2dPoints[points2DWithPlotStyleList, labels, GraphicsOptionArgumentRange2D -> argumentRange, GraphicsOptionFunctionValueRange2D -> functionValueRange] ``` 查看输入各组件的统计信息： ```mathematica indexOfComponentList = {1, 2}; numberOfIntervals = 5; argumentRange = {0.0, 1.0}; functionValueRange = {0.0, 30.0}; CIP‘Cluster‘ShowComponentStatistics[inputs, indexOfComponentList, ClusterOptionNumberOfIntervals -> numberOfIntervals, GraphicsOptionArgumentRange2D -> argumentRange, GraphicsOptionFunctionValueRange2D -> functionValueRange] ``` 结果显示输入近似均匀分布。如果需要 20 个代表，可以使用随机选择的方法： ```mathematica numberOfRepresentatives = 20; randomRepresentatives = CIP‘Cluster‘GetRandomRepresentatives[inputs, numberOfRepresentatives]; labels = {"x", "y", "Random representatives"}; argumentRange = {0.0, 1.0}; functionValueRange = {0.0, 1.0}; randomRepresentativesBackground = {randomRepresentatives, {PointSize[0.025], White}}; randomRepresentativesWithPlotStyle = {randomRepresentatives, {PointSize[0.02], Black}}; points2DWithPlotStyleList = {allInputVectorsWithPlotStyle, randomRepresentativesBackground, randomRepresentativesWithPlotStyle}; CIP‘Graphics‘PlotMultiple2dPoints[points2DWithPlotStyleList, labels, GraphicsOptionArgumentRange2D -> argumentRange, GraphicsOptionFunctionValueRange2D -> functionValueRange] ``` 随机选择的代表在这个示例中对输入空间的描述是令人满意的，但随机选择的输入并非严格等间距分布。另一种方法是基于聚类的选择： ```mathematica clusterRepresentatives = CIP‘Cluster‘GetClusterRepresentatives[inputs, numberOfRepresentatives]; labels = {"x", "y", "Cluster representatives"}; clusterRepresentativesBackground = {clusterRepresentatives, {PointSize[0.025], White}}; clusterRepresentativesWithPlotStyle = {clusterRepresentatives, {PointSize[0.02], Black}}; points2DWithPlotStyleList = {allInputVectorsWithPlotStyle, clusterRepresentativesBackground, clusterRepresentativesWithPlotStyle}; CIP‘Graphics‘PlotMultiple2dPoints[points2DWithPlotStyleList, labels, GraphicsOptionArgumentRange2D -> argumentRange, GraphicsOptionFunctionValueRange2D -> functionValueRange] ``` 基于聚类的代表似乎更均匀分布，在这个示例中，随机选择和基于聚类的选择结果相当，但基于聚类的选择略占优势。 ##### 2.2 非均匀分布输入示例当输入集在输入空间中具有不同的密度时，情况会有所不同。我们生成具有不同密度的输入： ```mathematica centroid1 = {0.3, 0.7 ```

最低0.47元/天解锁专栏

赠100次下载

继续阅读点击查看下一篇

400次会员资源下载次数

300万+ 优质博客文章

1000万+ 优质下载资源

1000万+ 优质文库回答

复制全文

聚类分析全解析：从固定数量聚类到实际应用

相关推荐

专栏目录

聚类分析全解析：从固定数量聚类到实际应用

相关推荐

基于贝叶斯统计与蒙特卡罗EM方法的广义线性混合模型聚类分析仿真研究

聚类分析在STATA中的实现和案例.doc.docx

基于同态加密的DBSCAN聚类隐私保护方案.docx

掌握R型聚类分析：K均值与层次聚类方法解析

模式识别聚类分析深入理解：第四版实现与应用

网格聚类算法高级探索：无监督学习的性能挑战解析

网格聚类结果可视化：最佳工具与方法揭秘

网格聚类参数调优秘诀：数据驱动的最佳参数选择方法

层次聚类算法解析：自底向上和自顶向下的方法比较

聚类算法优化秘笈：提升效率与准确性的5大黄金技巧

Cpplint - Static code checker for C++ (C++ 静态代码审查工具)

基于OBE模式的通信工程专业应用型人才实践能力培养研究.docx

专栏目录

最新推荐

Cadence AD库管理：构建与维护高效QFN芯片封装库的终极策略

【水管系统水头损失环境影响分析】：评估与缓解策略，打造绿色管道系统

【AutoJs脚本最佳实践】：编写可维护和可扩展的群自动化脚本（专家级指导）

【MATLAB信号处理项目管理】：高效组织与实施分析工作的5个黄金法则

【LabView图像轮廓分析】：算法选择与实施策略的专业解析

嵌入式系统开发利器：Hantek6254BD应用全解析

海洋工程仿真：Ls-dyna应用挑战与解决方案全攻略

深度学习模型的部署难题：pix2pixHD在生产环境中的部署秘技

【探索】：超越PID控制，水下机器人导航技术的未来趋势

RD3数据处理全流程攻略：从加载到深度分析