標題: 基於發布時間之推特主題推論方法
Publishing-time-based Topic Derivation in Twitter
作者: 陳世和
王國禎
Chen, Shih-He
Wang, Kuo-Chen
資訊學院資訊學程
關鍵字: 主題推論;發布時間;推特;非負矩陣分解法;Topic derivation;publishing time;Twitter;NMF
公開日期: 2017
摘要: 自社群媒體如推特、臉書出現以來,他們成為人們日常生活中不可或缺的一部份。此外,行動通訊和家用區域網路的發展更促進社群媒體的興盛。瀏覽社群媒體儼然成為一種有助於日常生活便利之重要方式,如汽車共乘、商品比價、團購和商品推薦。資料科學家試著從社群媒體間的主題推論中捕捉流行趨勢。然而在先前的研究中,從推特的文字和互動特色來作主題推論難以達到較高的純度。即使有現有方法tNMijF將時間因素納入考量,其推論效果亦受到推特互動關係中mention特徵比例的高低而有所偏差。我們提出基於發布時間之推特主題推論的方法(PTD),當中包含了時間因素,並以intJNMF為研究基礎。我們所提之方法利用了推特推文間的互動關係及內容來辨別主題,並且透過發布時間來強化,以取得較高的純度。與一個代表性的相關論文intJNMF相比,我們所提之方法的純度在20到100個主題中提高了19.01%至23.13%,而F度量則增加了31.76%至32.66%。與一個同樣利用時間因素的相關論文tNMijF相比,我們所提之方法純度在20到100個主題中提高了7.46%至10.32%,F度量則增加了10.16%至12.17%。我們所提之方法因考慮發布時間及當前時間之間的時間差,以致執行時間稍微增加。實驗結果顯示,藉由發布時間和當前時間之時間差來強化推特推文之間的互動關係,PTD在純度及F度量上有進一步的改善。
Since social media such as Twitter and Facebook come out, they become an essential part of daily life for most people. Besides, social media thrive further with the development of mobile communications and mobile phones. Surfing social media tends to be an important way to facilitate life, such as for carpool, price comparison, group buying and product recommendation. Data scientists try to capture trends from topic derivation in social media. However, it is hard to achieve high purity by deriving topics from text and interaction features in Twitter, as done by previous research. Even though a temporal factor was taken into consideration by related work tNMijF, its clustering accuracy still tends to be deviated by the percentage of the mention feature. We propose Publishing-time-based Topic Derivation in Twitter (PTD) approach, which is based on temporal (considering publishing time) factor and intJNMF. Our proposed PTD utilizes Twitter interaction and Twitter content to identify topics and is enhanced by publishing time to reflect content’s freshness to achieve high purity. Compared to intJNMF, a representative related work, the purity (F-measure) of the PTD is improved from 19.01% (31.76%) to 23.13% (32.66%) for 20 to 100 topics. Compared to tNMijF, the purity (F-measure) of the PTD is improved from 7.46% (10.16%) to 10.32% (12.17%) for 20 to 100 topics. The overhead of the PTD is slight increase of running time due to consideration of time difference between publishing time and current time. The performance results show the purity and F-measure of the PTD can be further enhanced by strengthening interaction features between Twitter tweets using the time difference between current time and publishing time.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070356822
http://hdl.handle.net/11536/142580
Appears in Collections:Thesis