標題: 利用混合模型對微陣列資料做分群變異數分析
Cluster ANOVA with Mixtures for Microarray Data
作者: 柳超毅
Chao-I Liu
盧鴻興
Herry Horng-Shing Lu
統計學研究所
關鍵字: 生物晶片;混合模型;microarray;mixture;ANOVA;EM
公開日期: 2003
摘要: 在微陣列資料分析中使用變異數分析時殘差通常是一個稀疏的分配. 因此, 我們嘗試對微陣列資料使用混合的變異數分析來做模型的建立, 希望使模型中的實驗因子更單純並且讓殘差更有彈性. 在混合模型中的參數使用EM演算法來估計具有較低的複雜度和單調收斂的性質. 在混合模型中, 分群的組數是用貝氏資訊法則並且利用主因子分析來選擇組數的初始值. 然後基因在被分組之後, 基因在每一組中的表現可以對多維常態分配的殘差使用簡單的變異數分析來建立模型. 因此, 對分群後的基因統計的估計和推論可以使用傳統的變異數分析, 包含最小平方估計法和F檢定. 在提出利用混合模型對微陣列資料做分群變異數分析這個新的建議之後, 基因可以透過簡單的變異數分析更有彈性的被分群. 在實證研究中也驗證了在各種不同的微陣列資料中CANOVAM是可行的.
Fitted residuals of ANOVA models for microarray data typically follow a sparse distribution. Hence, we are motivated to model microarray data by ANOVA with mixtures to have model simplicity for experimental factors and flexibility for residual sparsely. The parameters in mixtures are estimated by the generalized EM algorithms with low complexity and monotonic convergence. The number of clusters in mixtures is determined by the Bayesian information criterion and the initial estimate is generated by the projection to principal components. Then, genes are clustered so that the expressions of genes in every cluster can be modeled by a simple ANOVA model with a multivariate Gaussian distribution of residuals. Hence, statistical estimation and inference for every cluster of genes will be performed as the classical ANOVA, including least square estimation and F tests. By this new approach of clustered ANOVA with mixtures (CANOVAM), genes are clustered by simple ANOVA models with flexibility. Empirical studies are also investigated, which confirm the practical feasibility of CANOVAM for microarray data in various experiments.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009126510
http://hdl.handle.net/11536/55468
Appears in Collections:Thesis