Title: 綜合分析蛋白質位點的結構環境,保守度與其應用
Comprehensive Analysis of Residue Structural Environments, Conservation and Their Applications in Proteins
Authors: 劉人維
Liu, Jen-Wei
Hwang, Jenn-Kang
Keywords: 蛋白質結構;序列保守度;結構環境;可得空間;擁擠程度;residue structural environment;WCN;RSV;Voronoi;RSA
Issue Date: 2017
Abstract: 蛋白質功能或結構上重要的位點(如酵素催化位點)通常在演化上的保留度很高。要辨認這些這些保留度高的位點,我們通常會先找出一組同源蛋白質序列來計算序列保留分數。有趣的是,最近許多文獻發現序列保留分數和蛋白質的結構環境高度相關。也就是說,我們甚至不需要同源蛋白質的資訊,就可以直接拿單一個蛋白質結構來分析它所所隱含的演化資訊。我們研究室在2008年提出了WCN模型,它能描述蛋白質結構中每個位置的擁擠程度。WCN模型在2012年更被用來探討蛋白質結構與演化間的關係。而最近我們更發展出另一個RSV模型,它和WCN模型有著不同的概念,是描述蛋白質結構中每個位置的寬敞度。RSV模型只受到離自己最近的原子影響,而WCN則會考慮更遠的原子,這兩個模型將會是本篇論文中分析的主要重點。除了這兩個模型外,結構位點的暴露面積,深度,動態特性,或更精確的分子交互作用力(凡得瓦力與靜電力)也會在論文中介紹。我們會把任何一個獨立的蛋白質結構的每個位點,用一個數值來量化它的結構環境與演化保留分數。利用554個已知功能的酵素結構作為樣本,我們綜合探討了各種結構特性與演化保留度間的相關性,並把這些相關性延伸至蛋白質功能上重要位置的辨別,結構品質評估等應用層面。這些資訊不但能讓我們從大數據中系統化了解蛋白質結構與演化上的關聯,更能幫助生物學家簡單快速地分析快速出現的新未知蛋白質。
Functionally or structurally important residues such as catalytic sites in a protein are usually evolutionary conserved. To identify these conserved residues, a set of homolog sequences are generally used to calculate sequence conservation scores. Interestingly, the sequence conservation scores were found to be highly correlated with residue structural environments in a protein structure. Therefore, we could estimate evolutionary information from a single protein structure even without information from homolog proteins. In 2008, our lab proposed a residue structural environment of the WCN model, which can describe the packing extent of every residue in a protein structure. The WCN model was then used to explore the relationships between protein structure and sequence conservation in 2012. Recently, we proposed another structural model named relative space of Voronoi volume (RSV), which describes available space for every residue. RSV is only affected by nearest residues while WCN considers longer-range interactions. We will focus on RSV and WCN models in this thesis. Moreover, solvent exposure, depth, dynamic properties, or more sophisticated interactions of van der Waals or electrostatic force in protein structures will be introduced. With these structural descriptors, we can quantify the residue-level information with a number representing structural environments. Using the dataset of 554 enzymes, we explore the relationships between all structural environments and sequence conservation. Furthermore, we could use these relationships to identify functionally important residues or assess the quality of a protein model. With such detailed analysis of structural environments in this study, we believe there will be more applications for structural environments related to protein functionality, protein stability, structure prediction, or protein evolution in future research.
