標題: 使用智慧型三目標基因演算法選取標籤單核苷酸多型性
Selecting Tag SNPs Using an Intelligent Triobjective Genetic Algorithm
作者: 林玉祥
Lin, Yu-Hsiang
何信瑩
Ho, Shinn-Ying
生物資訊及系統生物研究所
關鍵字: 標籤單核苷酸多型性;單型;連鎖不平衡;多目標基因演算法;Tag SNP;Haplotype;Linkage Disequilibrium;Multi-Objective optimization
公開日期: 2010
摘要: 人類的DNA序列中包含各種遺傳變異性,其中最常被發現的遺傳變異為SNP(Single Nucleotide Polymorphism; SNP)。由多個SNPs所組成之集合稱為Haplotype(Haplotype)。國際單型圖譜計畫(International HapMap Project)運用高通量微陣列晶片技術完整解析人類各大族群的Haplotype圖譜,並揭露了SNP之間有程度不一的連鎖不平衡(linkage disequilibrium),因此只需鑑定一群具代表性的SNPs,即足以偵測出一大片段的Haplotype資訊。此代表性的SNPs稱為Tag SNPs(Tag SNPs)。 Tag SNPs選取問題已經被證明為一NP-hard的問題。為了有效解決此問題,本研究提出一套使用三目標最佳化的研究方法來解決Tag SNPs選取問題,以選出可辨識的所有Haplotype序列樣式之最小SNPs部分集合。過去研究多著重於改善求解方法之效率,而未評估所求之最佳解與其它最佳解之間的資訊差異。因此,為了達到選取出Tag SNPs最少、Haplotype相異性最大,且Haplotype多樣性最小的三個目標,透過特徵選取(feature selection)的觀念結合智慧型三目標基因演算法(ITOGA)達到最佳化多目標的目的。 本論文使用ITOGA具有搜尋全域最佳解與大量參數最佳化之能力,可快速求得多組不受支配的最佳解的方法,並應用到Tag SNPs選取問題。其目標特色有(1)選取Tag SNPs最小化 (2)Haplotype相異性最大化(3)Haplotype多樣性最小化。本文所提出的方法可以確實選擇少量且具影響力的Tag SNPs,並提高SNPs的覆蓋率,避免重複選取相似性較高的SNPs。另將ITOGA的效能和現有多目標演算法NSGA-2之效能做比較,以驗證所提的方法具較高效能。 實驗應用於維生素D受體基因(Vitamin D receptor; VDR)選取Tag SNPs,在977 SNPs中,文獻記載重要50個SNPs,計算結果顯示,由ITOGA選出18組重要的Tag SNPs最佳解集合。本文所提出的三目標能更精確的找到最佳解
Human DNA sequence contains several kinds of genetic variation and the single nucleotide polymorphism (SNP) that was found in highest frequency. Haplotype consists of a collection of the SNPs. International HapMap Project used high-throughput microarray chip technology to completely analyze human haplotype map of the major ethnic groups, and revealed varied degrees of linkage disequilibrium between SNPs. Therefore, to identify a representative group of SNPs just to detect a large fragment of Haplotype information. The representation of the SNP is called Tag SNPs. The problem of selecting Tag SNPs was proved to be a NP-hard problem, so heuristic methods may be useful to effectively solve this problem. This study proposes a Triobjective optimization to solve the problem of selecting Tag SNPs and to identify all the minimum pattern of SNP Haplotype set. The past researches put emphasis on improving the efficiency of solution finding method, but didn’t evaluate the distinctions among optimal solutions. In order to obtain the three goals which are minimizing the total amount of Tag SNPs, the dissimilarity of Haplotype and the diversity of Haplotype, we combined the feature selection method with Intelligent Triobjective Genetic Algorithm (ITOGA) to achieve the purpose of multi-objective optimization. This study use ITOGA to search the global optimal solution, to optimize a large number of parameters and has the ability to obtain the multiple non-dominated optimal solutions quickly. Moreover we apply it in the problem of selecting Tag SNPs. The characters of ITOGA are 1) minimizing the total amount of Tag SNPs, 2) maximizing the dissimilarity of Haplotype, and 3) minimizing the diversity of Haplotype. The proposed method can indeed choose small amount but influential Tag SNPs, It also can improve SNPs coverage rate and avoid choosing the duplications of SNPs. Moreover, compare the performances of ITOGA with existing NSGA-2 to verify the efficiency of the proposed method. The experiment was applied to select the vitamin D receptor gene Tag SNPs. In 977 SNPs, 50 important SNPs were reported in prior references. The results show that 18 SNPs selected by the ITOGA were the most significant group of Tag SNPs. The three goals presented in this paper can find optimal solution more accurately.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079751515
http://hdl.handle.net/11536/45823
顯示於類別:畢業論文


文件中的檔案:

  1. 151501.pdf
  2. 151501.pdf