可视化算法VxOrd论文研读

原文名：Cluster Stability and the Use of Noise in Interpretation of Clustering

- Clustering algorithms
- Data visualization
- Stability analysis
- Algorithm design and analysis
- Best practices

Introduction

VxInsight使用一个地形隐喻来描述大量的数据，通过在地形上相互靠近，来总结相似元素的簇。

In the Results and Discussion section, we describe how this analysis suggests important strategies for testing the robustness of clustering algorithms.

方法

计算基因的相似性

Using the raw correlations unduly weights the low similarities and does not adequately represent the information content contained in a strong similarity.

The non-linearity of this information, or rareness, is extreme and can change the total range of observed similarity weights by orders of magnitude.

We created all of the clusters reported here using gene pair similarities based on the t-statistic of the correlation coefficient, not on the correlation coefficient itself

1963年，为了报告相关性的统计意义，奥斯特尔[Ostel, 1963] 的常见做法是测试下一种情况
Ho : The observed n-sample correlation is consistent with observing two processes with a true correlation 0 = ρ ,

using a t-test with 2 n degrees of freedom and reject the hypothesis with some level of confidence α .

VxInsight序化程序

2000个顶点图（顶部）的布局，以及已知的K 5和双K 5（底部）的解决方案。

Fruchterman和Reingold 12的工作与我们的方法特别相关。

1. 由边连接的顶点应该相互靠近。
2. 非连接的顶点应该相互远离。
3. 结果应该对随机的初始条件不敏感。
4. 计算的复杂性应该降低到最小值。

原则1和2

Fruchterman等人计算了吸引力和排斥力。然后，这些术语用于为图形顶点生成新的位置。

Ki(x,y) = 一个顶点在一个特定的x，y位置的能量
ni      = 连接到顶点i的边数
wi,j    = 顶点i与顶点j连接的顶点之间的边权值。
l2i,j   = 顶点i的平方距离和边j的另一端的顶点之间的距离。
Dx,y    = 一个力项与xy附近顶点的密度成比例。

原则3

（如图所示，图中柱状图的和应为6000，上面的表示大部分节点在布局结束后都能发现45个左右的相同的邻居节点，下面的表示大部分节点在每次布局后都没有/0个相同的邻居节点）

<< 更多精彩尽在『程序萌部落』>>
<< https://www.cxmoe.com >>

原则4

Fruchterman8所讨论的网格变量算法使用一种binning技术来考虑特定区域内的那些顶点。

评估方法

We then visually compared the results of the two ordinations by coloring all of the genes in a cluster found in the first ordination and seeing where those colored genes were placed in the other ordination (so that a similar ordination would not break up the group of colored genes, but would still have them co-located;

结论

• 确保聚类工具对随机起始条件的稳定性
• 确保可能的聚类的范围是充分覆盖(通过系统地搜索一个大范围的选择,或使用一个工具,不需要先验判断聚类的数量)
• 使用聚类工具应对逐渐添加噪声的反映来深入了解聚簇的实际强度。

The two important analysis strategies presented here are: (1) use a probability weighted transformation of the correlation coefficients for the similarities, and (2) compute a small series of clusters with similarities mixed with increasing noise.

• 使用相似度的概率加权变换
• 计算一个具有相似性和不断增加的噪声的小系列的聚类。

References

[ 1 ] Davidson, G.S., Hendrickson, B., Johnson, D.K., Meyers, C.E. & Wylie, B.N. Knowledge mining with VxInsight: discovery through interaction . Journal of Intelligent Information Systems 11, 1998, 259-285.
[ 2 ] Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization , Mol. Biology of the Cell, 1998, 9:3273-3297.
[ 3 ] Wise, J.A., Thomas, J.J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., & Crow, V. Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents , Proceedings of InfoVis ‘95, IEEE, 1995, 51-58.
[ 4 ] Kohonen, T. Self-organized formation of topologically correct feature maps . Biological Cybernetics, 1982, 43:59-69.
[ 5 ] MacQueen, J. Some methods for classification and analysis of multivariate observations . Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Volume I: Statistics,. University of California Press, Berkeley and Los Angeles, CA, 1967, pages 281-297
[ 6 ] York, J., Bohn, S., Pennock, K., & Lantrip, D. Clustering and Dimensionality Reduction in SPIRE . Symp. on Advanced Intelligence Processing and Analysis, (1995), 73
[ 7 ] Wise, J.A. The ecological approach to text visualization . Journal of the American Society for Information Science 50(13), (1999), 1224-1233.
[ 8 ] Wilcox, R.R., Introduction to Robust Estimation and Hypothesis Testing , Academic Press, 1997, ISBN 0-12-751545-3
[ 9 ] Ostel, B., Statistics In Research Basic Concepts and Techniques for Research Workers , Iowa State University Press, Ames, Iowa, USA, 1963
[ 10 ] Fisher, R.A., On the probable error of a coefficient of correlation deduced from a small sample . Metron., 1921, 1 (No.4):3.
[ 11 ] Eades, P., A heuristic for graph drawing , Congressus Numerantium, 42, 1984, 149-160
[ 12 ] Fruchtermann, T. and Rheingold, E. Graph drawing by force-directed placement . Technical Report UIUCDCS-R-90- 1609, Computer Science, Univ. Illinois, Urbana-Champagne, Il., 1990
[ 13 ] Quinn, N. and Breur M., A force directed component placement procedure for printed circuit boards , IEEE Trans on Circuits and Systems, 1979, CAS-26, (6), 377-388
[ 14 ] Otten, R. and van Ginneken, L., The Annealing Algorithm , Kluwer Academic Publishers, Boston MA., 1989
[ 15 ] Kamada, T. and Kawai, S., Automatic display of network structures for human understanding , Technical Report 88-007, Department of Information Science, Tokyo University, 1988
[ 16 ] Davidson, R. and Harel, D., Drawing graphs nicely using simulated annealing , Technical Report CS89-13, Department of Applied Mathematics and Computer Science, The Weizmann Institute, Rehovot, Israel., 1989
[ 17 ] Kamada, T. and Kawai, S., An algorithm for drawing general undirected graphs , Information Processing Letters, 1989, 31, (1), 7-15
[ 18 ] Kamada, T. and Kawai, S., A simple method for computing general position is displaying three-dimensional objects , Computer Vision, Graphics, and Image Processing, 1988, 41, 43-56
[ 19 ] Kirkpatrick, S. Gelatt, C.D. and Vecchi, M.P., Optimization by simulated annealing , Science, 1983, 220, (4598), 671-680

😒 留下您对该文章的评价 😄