標題: 基因表現量晶片資料模擬器-使用公開之晶片資料庫
Gene expression microarray data generator using a reference training set from publicly available databases
作者: 吳芝賢
Wu, Chih-Hsien
黃冠華
Huang, Guan-Hua
統計學研究所
關鍵字: 艾菲爾基因晶片;模擬;Affymetrix GeneChip;simulation
公開日期: 2008
摘要: 微陣列晶片已經成為一種廣泛被應用的基因技術,許多分析方法也應運而生。我們嘗試建立經驗模型去模擬每個基因的基因表現量,這些模擬的基因表現量可用於評估各種分析方法。為了達到基因組織的多樣性,使用MaRe蒐集在GEO與Affy這兩資料庫儲存的基因原始表現資料,我們著重的平臺是艾菲爾(Affymetrix)公司所製造的HG-U133A基因晶片。將這些資料用justRMA預處理後,可得到22283個基因表現量的經驗分配模型,其中有5005個基因的基因表現量分佈呈現兩個或多個眾數,此5005個基因被認為是在某些組織是未表現的;17278個只有一個眾數的基因則被認為在所有組織都呈現有表現或未表現的。我們運用這22283個分配去模擬基因表現量。在此論文中提供了模擬方法的步驟,並嘗試模擬了多組不同片數的嵌釘(spike-in)資料,觀察基因表現量模擬值和原始值的差異。
Microarray expression analysis has become one of the most widely used functional genomics tools. Since that, many analytical methods have been proposed. It is desirable to develop realistic models that can be applied in simulating expression values of each gene, and can then be used to assess the analysis methods and testing approaches. We downloaded publicly available raw data of the Affymetrix HG-U133A platform for varied tissues, using Microarray Retriever. These raw data were first preprocessed using the R function justRMA, and then, for each gene, the expression intensity distribution was determined. Among 22283 genes, 5005 genes had two or more modes, 17278 genes had one mode. Genes displaying only one mode are believed either expressed in all tissues or unexpressed in all tissues. Therefore there were 5005 genes can be divided to expressed and unexpressed. In this thesis, we provided the process of simulation, and simulated various arrays of spike-in data to observe the difference between simulated data and real spike-in data.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079626519
http://hdl.handle.net/11536/42679
Appears in Collections:Thesis


Files in This Item:

  1. 651901.pdf