Title: 基因表現量晶片資料模擬器
Gene Expression Microarray Data Simulator
Authors: 黃冠華
Huang Guan-Hua
Keywords: 艾菲爾基因晶片;雲端運算;基因表現量微陣列晶片;微陣列資料庫;平行運算;模擬
Issue Date: 2011
Abstract: 微陣列晶片已經成為一種廣泛被應用的基因技術,許多分析方法也應運而生。我們嘗試建立經驗模型去模擬每個基因的基因表現量,這些模擬的基因表現量可用於評估各種分析方法。為了達到基因組織的多樣性,我們蒐集在Gene Expression Omnibus與ArrayExpress這兩資料庫儲存的基因原始表現資料,我們著重的平臺是艾菲爾(Affymetrix)公司所製造的HG-U133A基因晶片。將這些資料經過預處理後,可得到22283個基因表現量的經驗分配模型。我們運用這22283個分配去模擬基因表現量。在此計畫我們將提供模擬方法的步驟,並嘗試模擬了多組不同片數的嵌釘(spike-in)資料,觀察基因表現量模擬值和原始值的差異。本計畫亦將透過OpenMP與MPI平行運算,使得程式在執行大量基因晶片預處理計算的時間縮短,並且在高效能個人電腦工作站、國家高速電腦中心與Amazon EC2雲端運算三種不同電腦環境上運作,觀測他們的平行效率。由此得到的結果與經驗,將有可用於未來執行高維度基因資料分析之所需。
Microarray gene expression analysis has become one of the most widely used functional genomics tools. Since that, many analytical methods have been proposed. It is desirable to develop realistic models that can be applied in simulating expression values of each gene, and can then be used to assess the analysis methods and testing approaches. In this project, we plan to download publicly available raw data of the Affymetrix HG-U133A platform for various tissues from two public repositories: Gene Expression Omnibus and ArrayExpress. Then, an empirical approach is developed to determine the distribution of expression intensity for each gene, which can be used to simulate realistic gene expression data. The proposed method has several unique features that resolve the shortage of previous research. To evaluate the proposed simulating approach, we will examine the distributions of housekeeping genes, compare the simulated and real gene expression data, and simulate gene expression intensities, which mimic the expression patterns shown in the HG-U133A tag spike-in dataset, to determine the sensitivity and specificity of various differential expression detecting methods. This project also attempts to use OpenMP and MPI parallel computing to reduce computing time when reprocessing the large amount of downloaded microarray raw data. We will compare the parallel efficiency of OpenMP and MPI in the high efficient personal workstation, the National Center for High-performance Computing and the Amazon EC2 cloud computing environment. The results and experiences gained from this experiment can be applied to future high-dimensional genomic data computation
Gov't Doc #: NSC100-2118-M009-004
URI: http://hdl.handle.net/11536/99193
Appears in Collections:Research Plans