Back to Top

■ Pythia: A System for Online Topic Discovery of Social Media Posts

Pythia: A System for Online Topic Discovery of Social Media Posts, Iouliana Litou, Vana Kalogeraki, IEEE ICDCS 2017, Atlanta, GA, USA, June 5 - 8, 2017 (Demo)
Social media constitute nowadays one of the most common communication mediums. Millions of users exploit them daily to share information with their community in the network via messages, referred as posts. The massive volume of information shared is extremely diverse and covers a vast spectrum of topics and interests. Automatically identifying the topics of the posts is of particular interest as this can assist in a variety of applications, such as event detection, trends discovery, expert finding etc. However, designing an automated system that requires no human agent participation to identify the topics covered in posts published in Online Social Networks (OSNs) presents manifold challenges. First, posts are unstructured and commonly short, limited to just a few characters. This prevents existing classification schemes to be directly applied in such cases, due to sparseness of the text. Second, new information emerges constantly, hence building a learning corpus from past posts may fail to capture the ever evolving information emerging in OSNs. To overcome the aforementioned limitations we have designed Pythia, an automated system for short text classification that exploits the Wikipedia structure and articles to identify the topics of the posts. The topic discovery is performed in two phases. In the first step, the system exploits Wikipedia categories and articles of the corresponding categories to build the training corpus for the suppervised learning. In the second step, the text of a given post is augmented using a text enrichment mechanism that extends the post with relevant Wikipedia articles. After the initial steps are performed, we deploy k-NN classifier to determine the topic(s) covered in the original post.
Bibtex Entry.
  author    = {Iouliana Litou and
               Vana Kalogeraki},
  title     = {Pythia: {A} System for Online Topic Discovery of Social Media Posts},
  booktitle = {37th {IEEE} International Conference on Distributed Computing Systems,
               {ICDCS} 2017, Atlanta, GA, USA, June 5-8, 2017},
  pages     = {2497--2500},
  year      = {2017},
  crossref  = {DBLP:conf/icdcs/2017},
  url       = {},
  doi       = {10.1109/ICDCS.2017.289},
  timestamp = {Fri, 21 Jul 2017 13:46:43 +0200},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}