Identifying the Patterns of Hematopoietic Stem Cells Gene Expressions Using Clustering Methods: Comparison and Summary

by Jie Chen, Xi He, and Linheng Li

Journal of Data Science, v.2, no.3, 297-309

Abstract

Clustering algorithms have been used to analyze microarray gene expression data in many recent applications. In this paper, we make a comparison among popularly used clustering methods, including hierarchical clustering with average, complete, and single linkages, k-means clustering, k-means clustering with hierarchical initialization, and self organization map (SOM), by making use of our hemotopietic stem cell (HSC) microarray data. To understand the biological pathways from HSC to proliferative multipotent progenitor (MPP), and from MPP to either common lymphoid progenitor (CLP) or common myeloid progenitor (CMP), statistical clustering is an important tool. Our results demonstrated that the HSC microarray data set casts some challenge on clustering algorithms as different clustering algorithms resulted in clusters that were not all consistent. We compared the results by using the total within-cluster sum of squares of dispersions and the biological functions of the genes, and reached the conclusion that k-means clustering with hierarchical average or complete linkage initialization performed the best among all the methods we compared. Our investigation of the clustering methods with HSC microarray data provide a useful approach and guide to medical researchers who use clustering algorithms in analyzing their microarray or related data sets.

Homepage | Table of Contents | Full Text of This Article