標題: 小樣本多變數模型下貝氏方法之探討
Bayesian Methods in Small n Large p Problems
作者: 洪慧念
HUNG HUI-NIEN
國立交通大學統計學研究所
關鍵字: 高維度資料 貝氏分析
公開日期: 2010
摘要: 近十多年來由於基因晶片的發明產生了大量高密度的 cDNA 陣列資料。因此, 不少生物學家,資訊專家及統計學家發投入此類型的研究,發展出許多方法來分析 這些基因資料。這些資料有著共同的特性就是樣本數不多但是基因數目很多,亦即 所謂的的 large p small n 問題。 解決這類的問題,綜合過去的文獻資料,我們認為可以分成兩的步驟。首先是 如何挑選重要的基因,接著是要如何的利用這些基因做分析。在基因選取方面。近 幾年有一些學著嘗試從貝氏的角度來處理這些問題。 George Casella 等人利用適當 之事前分配導出一群基因表現量之收縮信賴區間。W. Jiang 考慮在廣義線性模型下 利用貝氏方法選取適當之變數。K.E. Lee 等人將一些簡易之貝氏方法利用在實際的 資料上選取重要基因。W.Wong 則考慮如何適當的選取事前分佈,來解決為數眾多 且同時產生的假說檢定的問題。 在上次的國科會計畫中,我們討論當樣本數固定時,且被測得的基因數目快速 增加。倘若影響某疾病的基因數目也固定(或以非常慢的速度增加),我們應該選 取多少數目的基因以做資料分析最為恰當。對於這個問題,我們嘗試一些方法。但 在實際計算時,我們發現計算需要非常耗費時間,並非一時之間可以完成。因此在 此計畫我們打算從貝氏的角度切入,希望能有一些較簡易的計算方式。 在貝氏方法方面,傳統的統計問題因為資料的個數遠大於參數的個數,因此事 前分佈的選取變的不是太重要。但在基因的資料方面,往往變數的維度非常的大, 導致參數的維度也非常的大。因此,選取事前分佈變的非常重要。在本計畫中我們 打算考慮當參數的維度很大時,參數空間會以某種形式逼近於一個無母數的空間。 這時我們先考慮此無母數空間上的機率測度,然後在進一步考慮此測度所對應於有 限個參數上的機率測度。
With advance technology in biology, high-throughput data such as microarry data are frequently seen in research work. Those data sets usually contains only a few samples but large number of variables. For analyzing this kind of data, fist we need to rank the importance of variables (genes), then we need to choose an importance subset of variables (genes) to analyze the microarray data (classification problem). In this two-year project, we will try to solve these two problems systematically by using Bayesian procedure. For the Bayesian procedure, first we need to specify a prior distribution for the problem. Usually, in the traditional statistical model, the choice of prior is not so important. But, in the large p small n problem, the number of parameter is huge, the choice of prior is become important. In our research, we notice that, when the number of parameter goes to infinity, the parameter space will go to an infinity dimensional space in some sense. In this case, we need to find a prior (probability measure) on a non-parameter space. For the calculation purpose, we need to chose a prior such that posterior is easy to management. In the literature, Casella used Bayesian method for the confidence of expression of a group gene. Jiang considered a Bayesian method for variable selection under generalized linear model. K.E. Lee applied Bayesian method on a real problem. Wong considered a prior by using Polya three method in the multiple testing problems. In this study, we will extend and compare their methods and then find a better and reasonable prior in a non-parameter space. Our purpose is to restrict this prior to a finite dimensional space and then find an easy way to do calculation on the high-dimension posterior. Finally, we will use our results to analyze data sets provide by Dr. Chen in the Moffitt Cancer Center & Research Institute,University of South Florida. We will compare our result with several existing results, hope we can get something new in those data sets.
官方說明文件#: NSC99-2118-M009-004
URI: http://hdl.handle.net/11536/100192
https://www.grb.gov.tw/search/planDetail?id=2126098&docId=340700
顯示於類別:研究計畫


文件中的檔案:

  1. 992118M009004.PDF