標題: LigSeeSVM:結合Support Vector Machines與資料融合在活性配體為基之藥物篩選及GPCR與GABAA之實際應用
LigSeeSVM: Support Vector Machines and Data fusion for Ligand-based Compound Screening and Applications to GPCR and GABAA
作者: 林柏村
楊進木
生物資訊及系統生物研究所
關鍵字: SVM;資料融合;電腦輔助藥物篩選;GPCR;SVM;data fusion;ligand-based virtual screening;GPCR
公開日期: 2004
摘要: 以配體為基(ligand-based)的藥物設計乃是因為受體(target protein)結構尚未被解出亦或無法得知,因而以活性配體的結構作分析歸納。本論文研究,主要是以2組不同的描述子(descriptor)來描述化合物特性:(1)atom-pair 描述子(AP descriptor)825個、(2)藥物熱動力學描述子(thermodynamic descriptor)6個及Accelrys Cerius2 預設的描述子(Cerius2 default descriptor)13個。應用這2組不同的描述子,以LibSVM為篩選工具,分別產生2組不同的結果,將SVM輸出的結果,轉換成Z-score,再將結果依Z-score排序。最後,更將2組結果,用rank資料融合(rank combination)的方法,得到第3組結果。本研究以TK(thymidine kinase)活性配體、ER抑制劑(estrogen receptor antagonist)、ER促進劑(estrogen receptor agonist)各10個、GPCR及GABAA活性配體共100個,再加上從化合物資料庫ACD中隨機挑選出990個及化合物資料庫CMC中隨機挑選出7300個化合物,做為測試組,以SVM預測已知活性配體在所有化合物中的位置,藉此觀察SVM在篩選化合物資料庫上的表現。在以990個ACD化合物為化合物資料庫的測試組裡,比較SVM與其他方法(Surflex-Sim, Ajay N. Jain, 2004)的表現,SVM在以rank資料融合方法所得到的結果為最好,在true hit%達100%時,偽陽性的比例(false positive rate)分別為0.3%(TK)、0.6%(ER antagonists)及0%(ER agonists)。以ROC曲線圖而言,在GPCR及GABAA活性配體測試組的表現,SVM也確實在虛擬藥物篩選上表現較好。在以7300個CMC化合物為化合物資料庫的測試組裡,我們觀察到擁有較高的Z-score且排名在前面的化合物中,大部分與已知活性配體具有極其相似的結構。有一部分化合物擁有高Z-score,但其結構與已知活性配體不相似,很有可能是新的先導化合物(novel lead compounds)。綜合SVM在上述不同測試組的結果,我們可以確定SVM是適合用於虛擬藥物篩選上並且其表現優於其他同樣以活性配體為基的方法。
A major benefit of ligand-based drug screening approaches is that they can perform screening even though the drug targets whose three-dimensional structure is not known enough to permit structure-based virtual screening. In this thesis, we have developed a Ligand-based Screening tool using Support Vector Machines and data fusion method, termed LigSeeSVM. We combine structure descriptors (825 atom pair descriptors) and physicochemical descriptors (Accelrys Cerius2 six thermodynamic and 13 default descriptors) to characterize compounds’ features. Next, we used SVM to generate SVM-AP model based on 825 AP descriptors and SVM-PC model based on 19 physicochemical descriptors. The predicted scores of both SVM-AP and SVM-PC models are normalized by transferring the scores to Z-scores. We fused SVM-AP and SVM-PC predicted results using rank combination to create LigSeeSVM predicted model, respectively. In this study, we used 10 thymindine kinase substrates, 11 estrogen receptor antagonists, 10 estrogen receptor agonists, 100 GPCR and GABAA ligands, combined with 990 randomly chosen compounds from the ACD or 7300 randomly chosen compounds from the CMC as screening sets. Using these screening sets to verified the utility of LigSeeSVM on virtual screening. When the true hit rate was 100%, the false positive rates were 0.3% for TK, 0.6% for ER antagonists, and 0% for ER agonists. The ROC curves of GPCR and GABAA screening sets also shown that the performance of the LigSeeSVM is better than other ligand-based virtual screening approaches on these data sets. The results of the LigSeeSVM using 7300 CMC randomly chosen compounds as compound database shown that the majority of compounds with high Z-score also have structures similar to the known ligands, some compounds with high Z-score but have different structures compared with the known ligands, and these compounds have more possibility to become novel, potential lead compounds. Our results suggest that LigSeeSVM is practically applicable for ligand-based virtual screening and offers competitive performance to other ligand-based virtual screening approaches on these data sets.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009251511
http://hdl.handle.net/11536/77493
Appears in Collections:Thesis


Files in This Item:

  1. 151101.pdf