Predicting and analyzing pore-forming subunits of human membrane transport proteins
|關鍵字:||膜轉運蛋白;成孔次單位;資料庫;編碼;繼承式雙目標基因演算法;預測器;membrane transport proteins;pore-forming subunits;database;code;Inheritable Bi-objective Genetic Algorithm;predictor|
在近期預測膜轉運蛋白的相關文獻中，大多研究只有預測和分類膜轉運蛋白和其所隸屬的TC家族，而非預測膜轉運蛋白的成孔次單位，目前尚未有預測膜轉運蛋白成孔次單位的研究和膜轉運蛋白成孔次單位的專門資料庫。也因此在本篇研究中我們以全球最大的人工註解蛋白質資料庫和透過搜尋文獻確認人類膜轉運蛋白成孔次單位並建立了人類膜轉運蛋白成孔次單位的dataset (POSATS)，之後以此dataset為基礎建立人類膜轉運蛋白成孔次單位的預測器(POSTPred 1.0)並探討預測結果。
POSATS共包含了5176條人類穿膜蛋白，共9個超家族、916個家族、728條膜轉運蛋白成孔次單位、190條有潛力的膜轉運蛋白成孔次單位、4258條非膜轉運蛋白成孔次單位和引用379篇相關文獻。隨後我們隨機取728條膜轉運蛋白成孔次單位和非膜轉運蛋白成孔次單位再各隨機取2/3以智慧型基因演算法擷取特徵和支援向量機建立POSTPred 1.0，其餘1/3測試POSTPred 1.0的效能。
POSTPred 1.0共使用了18個特徵，其五折交叉驗證之準確度為86.42%，獨立測試的準確度為84.71%。此18個特徵經由主效果分析顯示該特徵組中最有影響力的兩個特徵皆為蛋白質內部胺基酸轉移至外表所需要的能量，其與膜轉運蛋白成孔次單位成孔有極大的關聯性，這說明了POSTPred 1.0除了有不錯的預測準確度且其所使用的特徵組確實能有效區分膜轉運蛋白成孔次單位和非膜轉運蛋白成孔次單位。|
A membrane transport protein is a protein complex which composed by several membrane transport protein subunits. Among these subunits, the pore-forming subunits play a key role in composed of the transmembrane pore. Transmembrane pores dominate the passing of inner and outer substances through plasmamembranes and organelle membranes; these pores regulate various important biological processes. Owing to the fact that many diseases are caused by the defection of pore-forming subunits such as channelopathies and hemophagocytic syndrome and many drug designing are based on pore-forming subunits such as drug resistance and specificity, identifying and researching of pore-forming subunits is the point of biological and medical research field from the past to nowadays. Recently, there are many researches using various bioinformatic tools to do the preliminary screening before identifying, but these screening methods and results need to be designed and judged by users respectively; thus, the final protein numbers are different after various screening methods. When it comes to dealing with new classes of pore-forming subunits which were not found before, the protein number after preliminary screening by bioinformatic tools is still large, which will cost a lot of time and prime cost. Therefore, it is necessary to provide a precise pore-forming subunits prediction tool to shorten the identigying time and reduce the prime cost. In the recent related works of predicting membrane transport proteins, most researches only predicted and classified the membrane transport proteins and the TC family it belongs to, not the pore-forming subunits of membrane transport proteins. Up to now, there are no researches about predicting pore-forming subunits and databases specialized for pore-forming subunits. Therefore, in this work, we began with a well-known, manual annotated protein database and search for the papers which provided evidences for human pore-forming subunits. After collecting enough information, we constructed Pore-fOrming Subunits of humAn Transporter Set (POSATS). Based on this dataset, we also constructed a predictor for human pore-forming subunits (POSTPred 1.0) and analyzed the result. POSATS comprised 5176 human transmembrane proteins, totally 9 superfamilies, 916 families, 728 pore-forming subunits, 190 potential pore-forming subunits, 4258 non -pore-forming subunits and 379 curated literature references. For predicting, we randomly chose 728 pore-forming subunits and 728 non-pore-forming subunits first; next we randomly chose 2/3 as input for IBCGA-SVM; after 30 independent runs, we got an output of an informative feature set. This optimized feature set was used to construct POSTPred 1.0. Last, the remaining 1/3 part was for POSTPred 1.0 performance testing. The optimized feature set of POSTPred 1.0 totally used 18 features, and its 5-fold cross validation accuracy and independent test accuracy were 86.42% and 84.71% respectively. After MED analysis, we found that the top 2 of the 18 features are energy requirements of amino acids transferring from inside to outside. These two features are also highly related to the poring of pore-forming subunits. We concluded that this result consisted with the good prediction accuracy of POSTPred 1.0 and the optimized feature set of POSTPred 1.0 could efficiently differentiate between pore-forming subunits and non-pore-forming subunits.