標題: MuLiSA: 多重配體結構比對為基礎之蛋白質功能片段及重要氨基酸之預測分析
MuLiSA: Analysis and Identification of Functional Motifs and Residues in Proteins by Multiple Ligand-bound Structure Alignments
作者: 林建宏
Lin Chien-Hung
楊進木
Jinn-Moon Yang
生物資訊及系統生物研究所
關鍵字: 蛋白質功能預測;蛋白質功能決定性位置;三磷酸腺苷結合蛋白質;多重結構比對;多重配體結構比對;相對亂度;Function Prediction;Essential Sites;ATP-binding Proteins;Multiple Structure Alignment;Multiple Ligand-bound Structure Alignment;Related Entropy
公開日期: 2003
摘要: 由於快速大量增加的蛋白質序列相關資訊及蛋白質多樣性,僅利用蛋白質序列來預測並鑑定蛋白質功能是一件相當重要且急迫的任務。在這篇論文中,我們發展了一個新的方法來鑑定與配體結合的蛋白質高度保留氨基酸及motifs。在MuLiSA(多重配體結構比對)這個新方法中,我們首先將多個與配體結合蛋白質的配體重疊,使位在配體結合區域的氨基酸自然而然地疊合在一起。接著我們利用氨基酸位置及氨基酸序列片段亂度計算的z-score 來鑑定重要的氨基酸位置及典型的序列片段。當我們鑑定出新的典型序列片段後,我們會建立該典型序列片段的側寫並用來對預測只擁有蛋白質序列資訊的蛋白質功能。我們已將此方法應用在三種與配體結合的蛋白質上:ATP-binding proteins,ADP-binding proteins 和HEM-binding proteins。實驗的結果顯示由我們鑑定出的高度保留氨基酸及典型片段與配體結合的功能有相當程度的關係,並已鑑定出一些文獻上證實的重要氨基酸位置。儘管目前所鑑定出的重要片段對擁有特定功能蛋白質的覆蓋度不高,例如在ATP-binding proteins,motor proteins 及HEM-binding proteins 的覆蓋率為23.51%,47.64% 及13.60%。然而在kinesin 的功能預測下準確率高達86.49%。因此我們相信當我們加大與配體結合之蛋白三級結構資訊後,我們能增加蛋白質功能預測的準確度並且挖掘出更多新的資訊供科學家們做更深入的研究。我們發現多重配體結構比對能鑑定出高度保留的典型序列片段並且在部分的與配體結合蛋白質中比一些傳統蛋白質結構或序列比對工具,如C.E.及CLUSTALW 表現更佳。我們認為此多重配體結構比對技術能幫助科學家們發現與配體結有高度合專一性的氨基酸及重要的典型片段。
To predict and identify details regarding function from protein sequences is an emergency task since the growing number and diversity of protein sequence. Here, we develop a novel approach for identifying conservation residues and motifs of ligand-binding proteins. In this method, called MuLiSA (Multiple Ligand-bound Structure Alignment), we first superimpose the ligands of ligand-binding proteins and then the residues of ligand-binding sites are naturally aligned. We identify important residues and patterns based on the z scores of the residue entropy and residue-segment entropy. After identifying new pattern candidates, the profiles of patterns are generated to predict the protein function from only protein sequences. We tested our approach on three kinds of ligand-binding proteins: ATP-binding proteins, ADP-binding proteins and HEM-binding proteins. The experiments show that the conservation residues and novel patterns we identified are really correlated with protein functions of certain ligand-binding proteins and we have also identified conservation residues that were verified by previous studies. Although the coverage is not good, such as the coverage rate of ATP-binding proteins, motor proteins and HEM-binding proteins are 23.51%, 47.64 and 13.60%, we also observed that perdition accuracy of kinesin is 86.49%. We believe if we broaden the training dataset, we can improve the prediction accuracy and mining more new information for researchers to do further research. We found that multiple ligand-bound structure alignments can identify conservation patterns and is better than traditional alignments such as CE and CLUSTALW in some ligand-binding proteins. We think that this multiple ligand-bound structure alignment technique can help researchers to discover ligand-binding specificity-determining residues and functional important patterns.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009151507
http://hdl.handle.net/11536/61447
Appears in Collections:Thesis


Files in This Item:

  1. 150701.pdf