Computational Structural Biology Studies on: I. Conformational entropy II. Protein dynamics
|關鍵字:||結構亂度;蛋白質動態;結構生物學;生物資訊;conformational entropy;protein dynamics;structural biology;bioinformatics|
|摘要:||一段蛋白質序列通常會構成一個獨特的立體結構，但前人的研究指出，不論是人工合成或是自然界存在的短序列片段，都發現到它們會在不同的結構環境中，形成不同的二級結構。這種短序列的特殊性質，最早是由Kabsch和Sander所發現的，在他們利用序列的相似性來預測蛋白質結構的研究中，發現了這個現象，並把有這類性質的短序列，命名為chameleon (變色龍)。他們發現了許多長度為五個胺基酸的小序列片段，在不同的蛋白質裡面，形成了不同的二級結構。相反的，有些小序列片段，不管在什麼蛋白質中，都形成相同的二級結構，這樣子的序列片段，稱為有高度的結構保守性 (structure conservation)，而具chameleon性質的序列片段，其結構保守性則很低。在這部份的工作裡，我們利用Support vector machine，開發了一種預測蛋白質短序列片段之結構保守性的方法。而我們將預測的結果，和Hydrogen isotope exchange實驗互相比較，發現了高度結構保守性的胺基酸，通常也具有緩慢的Hydrogen isotope exchange速率。
研究蛋白質的動態 (dynamics)是分子生物學研究中，一個很重要的課題。最典型的計算生物學方法是分子動態模擬 (molecular dynamics simulation，MD)，這種方法計算原子和原子間的交互作用：鍵能、電荷作用等等。雖然MD的計算很精確，但是缺點是需要大量的計算時間和參數調整。另一種方法是Gaussian network model (GNM)，它把蛋白質結構轉換成一個C脉原子連結而成的網路結構，GNM可以估算出個別胺基酸的熱擾動 (thermal fluctuation)以及胺基酸之間的動態相關性 (correlation of motion)。最近由本實驗室開發的方法Protein-fixed-point (PFP) model，是一個非常簡單而準確的方法，所需要的資訊只有胺基酸C脉原子的空間座標。首先決定整個蛋白質構造的重心 (center of mass)座標，我們發現某一胺基酸的熱擾動大小，和它到蛋白質重心的距離平方成正比例的關係。另一個本實驗室開發的方法是Weighted-contact number (WCN) model，它計算胺基酸周圍的原子數目，愈接近的原子可以得到較高的權重 (weight)，反之則影響較小。胺基酸若處於有很多原子集結的環境，則熱擾動的值較小。PFP和WCN model的結果顯示，蛋白質的動態資訊，可以單純的由它的結構推算出來，並不需要做任何機械模型的假設 (如GNM)。在這部份的研究，為了驗證PFP和WCN model的正確性，我們利用這兩個方法來預測蛋白質的NMR order parameter，並和實驗值相比較。這兩種方法的預測結果都比前人所做的方法要好。
蛋白質的功能通常會牽涉到結構上大規模的運動 (large scale motion)。Normal mode analysis (NMA)早在1980年開始，就被用來研究蛋白質的大規模運動，它最主要的特點是把蛋白質的動態拆解成很多不同頻率的運動，包含了頻率較低、較大規模的運動，以及頻率較高、較小規模的運動。生物學家通常感興趣的部份是頻率較低、較大規模的運動，因為通常這類的運動和蛋白質的功能表現有最直接的關係。NMA最早是把MD模擬當中的位能函數 (potential function)做二次微分得到Hessian matrix，再對它做對角化 (diagonalize)運算後，得到蛋白質中各種不同頻率的運動。而這個方法的缺點是，在計算較大的蛋白質時，計算時間會變的非常龐大。另一種方法是Elastic network model (ENM)，它把蛋白質結構轉換成一個C脉原子連結而成的網路結構，基於這個網路結構，前人開發出了一種簡化版本的NMA，它大幅減小了所需要的計算時間，並且也可以得到非常好的結果。現今最廣泛被使用的ENM方法是Gaussian network model (GNM)。本實驗室所研發的PFP model，是一種簡單，同時可以準確預測胺基酸熱擾動的方法，我們基於PFP model，研發出了另一種NMA的方法，在這部份的研究裡，我們將PFP model的NMA結果，和GNM的結果相比較，並且發現了它們在所研究的例子裡，大致上有相吻合的結果。|
A complete protein sequence usually determines a unique structure; however the situation is different for shorter subsequence. Studies found that both designed and nature occurring subsequences may have different secondary structures in different contexts. This feature of short sequence is called “chameleon” which was first reported by Kabsch and Sander when they used sequence homology to predict protein structures. They found that several pentapeptides which have identical sequence adopt different secondary structures in different protein structures. For nature occurring proteins, systematic search on PDB shows that identical subsequences could have very different conformations. Here we developed a method to compute structure conservation from protein sequence. During protein folding process, there are some structured regions which are similar to folded conformation. Hydrogen isotope exchange (HX) rate is usually used to identify those structured regions. We applied this method to a set of proteins with known HX rate data and found a strong correlation between structure conservation and slow HX rate. One of the most important topics in biological science is to understand the protein function. It is well-known that protein dynamics is closely related to the function of protein. Several computational methods have been developed to get the protein dynamics. Molecular dynamics (MD) simulation has been widely used in the study of protein function and dynamics. It simulates the interactions between each atom, bonding force, van der Waals force, charge-charge interaction, etc. The computation time is extremely long when the size of the protein is large and the selection of appropriate parameters of force field itself is a complicated problem. Gaussian network model (GNM) transfers the protein structure into a network in which each C脉 atom pair is connected together if their distance is smaller than a given cutoff value. Using this protein-converted network, GNM can compute the theoretical thermal fluctuation of each atom and correlation of motions between each atom pair. Recently we have developed a model to predict the thermal fluctuation from protein structure, which is called protein-fixed-point (PFP) model. The PFP model only uses the coordinates of C脉 atoms and simply determines the center of mass of the protein. We found that the thermal fluctuation is proportional to the squared distance from the atom to the center of mass of the structure. Another model called weighted contact number (WCN) model computes the number of neighboring atoms weighted by the inverse distance between each atom pair. The PFP and WCN model show that the protein dynamics can be extracted directly from the intrinsic property of protein structure without the use of any mechanical model. The order parameter obtained by the NMR experiment is widely used to study the dynamic-related protein functions. Here, we use the PFP and WCN model to predict the N-H S2 order parameter directly from the protein structure. Our results show that the WCN model can more accurately reproduce the experimental order parameter than previous publication. The biological function of proteins is closely related to cooperative motions and correlated fluctuations which involve large portions of the structure. Normal Mode Analysis (NMA) had been used to study biomolecules since early 1980s. It decomposes the protein dynamics into a collection of motions which include large scale/low frequency and small scale/high frequency motions. Biologists usually focus on the large scale/low frequency motions which are relevant to protein functions. The major contribution of NMA to the biological research field is the ability to provide the information of large, domain-scale protein motions which is hard to compute by other methods. The classical approach of NMA is to diagonalize the Hessian matrix, i.e. the second derivative of the potential function of a molecular dynamics (MD) simulation. The major shortcoming of the classical NMA is that the sampling time increases dramatically with the size of the protein. The Elastic Network Model (ENM), which is able to describes protein dynamics without amino acid sequence and atomic coordinates, has been widely used in the studies of protein dynamics and structure-function relationship. The ENM views the protein structure as an elastic network, the nodes of which are the C脉 atoms of individual residues. Residue pairs within a cutoff distance are connected by springs which have a uniform force constant in the network. Based on ENM, a coarse-grained version of NMA is developed and widely used because of its low computation cost and the ability to extend the dynamics to longer timescale and larger motions. The coarse-grained NMA had been applied to various topics, for example, protein functions and catalytic residues. One of the most widely used ENM-based methods is the Gaussian network model (GNM). The protein-fixed-point (PFP) model is a simple method to compute the protein dynamics only using the coordinates of C脉 atoms. Despite its simplicity, it has been shown to be able to accurately predict the B-factors for a dataset of 972 proteins. Here, we compared the results of NMA based on the PFP model with those by Gaussian network model (GNM).
|Appears in Collections:||Thesis|
Files in This Item: