標題: 基於Apache Spark分散式平台之廣告點擊率預測方法比較
Comparisons of Techniques for Predicting the Click-Through Rate of Advertisements based on the Apache Spark Platform
作者: 賴仁偉
劉敦仁
Lai, Jen-Wei
Liu, Duen-Ren
管理學院資訊管理學程
關鍵字: 分散式處理;資料探勘;廣告推薦;點擊率預測;Distributed processing;Data mining;Advertisement recommendation;TR prediction
公開日期: 2016
摘要: 近年來,網際網路的瀏覽人數劇增,消費者的購買習慣改變,逐漸從實體商店轉移到網路商店。網路商店的行銷方式主要是透過數位廣告來進行,因而出現多種計價模式,其中以點擊率為最常見的指標。廣告商藉此能了解消費者的需求,並提供消費者感興趣的廣告內容,以達到增加利潤的目標。 本研究以廣告點擊資料來進行大數據分析,並使用Apache Spark大數據分散式處理平台提供的機器學習法,來預測廣告點擊率。本研究透過資料前處理、特徵值選擇、最佳參數調校及分散式運算,分別以決策樹、支持向量機、邏輯式回歸及類神經網路,來建置預測模型,並評估比較各預測模型的成效。實驗結果顯示,各預測模式皆有一定程度的可靠性,以類神經網路及邏輯式回歸的預測效果較佳。
The number of Internet browse has increased dramatically in recent years. Con-sumer buying habits change gradually from physical stores to online stores. Digital ad-vertising is one of the major marketing methods for online stores. There are a variety of pricing models for Digital advertising, in which the click through rate is the most com-mon indicator. Advertisers can understand the needs of consumers, and provide con-sumers interested advertising content, with the goal of increasing profits. This research analyzes the click data of advertisements to predict the click through rate by using the machine learning methods provided on the Apache Spark platform. Data pre-processing, feature selection, optimal parameter tuning and distributed pro-cessing are carried out to build the prediction models by using the DT (Decision Trees), SVM (support vector machines), LR (Logistic Regression), and ANN (Artificial Neural Network), respectively. This research evaluates and compares the effectiveness of the prediction models. The experiment results show that each model is reliable and the ANN and LR perform better than other models.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070363420
http://hdl.handle.net/11536/138927
Appears in Collections:Thesis