標題: 利用次世代基因定序的資料來探討遺傳基因上片段的變異和疾病之間的關係
Association Studies of Short DNA Fragments from Next-Generation Sequencing
作者: 賴又菱
黃冠華
統計學研究所
關鍵字: 次世代基因定序;UK10K;邏輯斯迴歸;卜瓦松迴歸;Next-generation sequencing;UK10K;logistic regression;Poisson regression
公開日期: 2013
摘要: 隨著次世代基因定序(next-generation sequencing)技術的發展,處理其所產生的巨量資料,需要有效率且新穎的分析方法,本篇論文的數據,是利用UK10K計畫 (http://www.uk10k.org/) 下的資料,採用病例對照 (case-control) 研究法,蒐集在蘇格蘭的214位精神分裂症患者,在基因目標區 (exomes) 上的基因序列,以及740位健康的人,採用全基因定序(whole genome sequencing)所得到的基因序列。本篇論文主要研究,在目標區偵測的小範圍插入 (insertion) 或刪除 (deletion) 片段和疾病之間的關係。我們選擇以第12號染色體為例,將954人依據目標區合併成一個檔案進行分析。分析首先必須確定變異片段的範圍。我們利用vcf檔的輸出結果,將鄰近且有重疊部分的片段視為一個區域,合併的結果共有1283個區域。第二步為確定每個人在此區域是沒有變異或是有插入片段亦或是有刪除的片段。最後利用邏輯斯迴歸 (logistic regression) 與卜瓦松迴歸 (Poisson regression) 模型,找出區域變異類型與疾病之間的關係。
With the development of next-generation sequencing, we need efficient and original methods to analyze the big data. A case-control study in the UK10K project (http://www.uk10k.org/) collected 214 schizophrenia patients using exome sequencing and 740 healthy persons using whole genome sequencing in Scotland. Association studies of short DNA fragments such as insertion and deletion are the main subjects. In the beginning, we merge 954 people into one vcf file by target regions in the 12th chromosome and decide the regions of variations. That is, we consider fragments which are in the neighborhood and overlapped as the same region. After this step, we have 1283 regions. Second, we need to know the type of variations (reference, insertion, and deletion) for each sample in each region. Finally, we apply logistic regression and Poisson regression to find a relation between diseases and variations.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070152602
http://hdl.handle.net/11536/74826
Appears in Collections:Thesis