Publishing-time-based Topic Derivation in Twitter
|關鍵字:||主題推論;發布時間;推特;非負矩陣分解法;Topic derivation;publishing time;Twitter;NMF|
Since social media such as Twitter and Facebook come out, they become an essential part of daily life for most people. Besides, social media thrive further with the development of mobile communications and mobile phones. Surfing social media tends to be an important way to facilitate life, such as for carpool, price comparison, group buying and product recommendation. Data scientists try to capture trends from topic derivation in social media. However, it is hard to achieve high purity by deriving topics from text and interaction features in Twitter, as done by previous research. Even though a temporal factor was taken into consideration by related work tNMijF, its clustering accuracy still tends to be deviated by the percentage of the mention feature. We propose Publishing-time-based Topic Derivation in Twitter (PTD) approach, which is based on temporal (considering publishing time) factor and intJNMF. Our proposed PTD utilizes Twitter interaction and Twitter content to identify topics and is enhanced by publishing time to reflect content’s freshness to achieve high purity. Compared to intJNMF, a representative related work, the purity (F-measure) of the PTD is improved from 19.01% (31.76%) to 23.13% (32.66%) for 20 to 100 topics. Compared to tNMijF, the purity (F-measure) of the PTD is improved from 7.46% (10.16%) to 10.32% (12.17%) for 20 to 100 topics. The overhead of the PTD is slight increase of running time due to consideration of time difference between publishing time and current time. The performance results show the purity and F-measure of the PTD can be further enhanced by strengthening interaction features between Twitter tweets using the time difference between current time and publishing time.
|Appears in Collections:||Thesis|