Temporal-Textual Retrieval: Time and Keyword Search in Web Documents

##plugins.themes.academic_pro.article.main##

Ali Khodaei
Cyrus Shahabi
Amir Khodaei

Abstract

As the web ages, many web documents become relevant only to certain time periods, such as web-pages containing news and events or those documenting natural phenomena. Hence, to retrieve the most relevant pages, in addition to providing the relevant keywords, one may desire to identify the relevant time period(s) as well, e.g., "Barack Obama 1980-1985". Unfortunately, not much work has been done by industry or academia to support this type of searches. To the best of our knowledge, the only way that some search engines exploit the time information in the user query is to lter out those resulting web pages whose publication/modication time are not within the queried time interval. In this paper, we propose a new indexing and ranking framework for temporal-textual retrieval. The framework leverages the classical vector space model and provides a complete scheme for indexing, query processing and ranking of the temporal-textual queries. We propose a variety of approaches to exploit popular keyword and temporal index structures. We present a novel hybrid index structure which indexes both the temporal and the textual aspects of the documents in a unied, integrated manner. We also study how to rank documents by seamlessly combining their temporal and textual features. We develop a new scoring schema called temporal tf-idf to compute the temporal relevance of a document to a query, and we combine this score with the textual relevance to compute the overall relevance score of the document to the query. We present both a cost model analysis and an extensive set of experiments over real-world datasets (New York Times Annotated Corpus and Freebase) to evaluate the proposed framework and demonstrate its eciency and eectiveness.

##plugins.themes.academic_pro.article.details##

How to Cite
Khodaei, A. ., Shahabi, C. ., & Khodaei, A. . (2012). Temporal-Textual Retrieval: Time and Keyword Search in Web Documents. International Journal of Next-Generation Computing, 3(3), 288–312. https://doi.org/10.47164/ijngc.v3i3.39

References

  1. Lingua::en::tagger.
  2. Timeml speci cation language.
  3. Allen, J. F. 1981. An interval-based representation of temporal knowledge. In IJCAI'81.
  4. Alonso et al., O. 2006. Clustering of search results using temporal attributes. In SIGIR.
  5. Alonso et al., O. 2007. On the value of temporal information in information retrieval. SIGIR Forum.
  6. Arikan et al., I. 2009. Time will tell: Leveraging temporal expressions in ir. In WSDM.
  7. Baeza-Yates et al., R. 1999. Modern Information Retrieval.
  8. Baeza-yates et al., R. A. 2005. Searching the future. In: SIGIR Workshop MF/IR.
  9. Berberich et al., K. 2007a. Fluxcapacitor: Ecient time-travel text search. In VLDB.
  10. Berberich et al., K. 2007b. A time machine for text search. In SIGIR.
  11. Berberich et al., K. 2010. A language modeling approach for temporal information needs. In ECIR.
  12. Corso et al., G. M. D. 2005. Ranking a stream of news. In: WWW.
  13. Dai et al., N. 2010. Freshness matters: in owers, food, and web authority. Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval.
  14. Dakka et al., W. 2008. Answering general time sensitive queries. CIKM.
  15. Efron, et al., M. 2011. Estimation methods for ranking recent information. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 495{504.
  16. Fagin et al., R. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci..
  17. Graham et al., C. 2009. Forward decay: A practical time decay model for streaming systems. ICDE.
  18. Herscovici et al., M. 2007. Ecient indexing of versioned document sequences. AIRS.
  19. Jin et al., P.
  20. Jin et al., P. 2008. Tise: A temporal search engine for web contents. Intelligent Information Technology Applications.
  21. Jin et al., P. 2011. Indexing temporal information for webpages. Computer Science and Information Systems ComSIS 8, 3, 711{737.
  22. Kalczynski et al., P. 2005. Temporal document retrieval model for business news archives. Inf. Process. Manage..
  23. Leong Hou et al., U. 2010. Durable top-k search in document archives. In SIGMOD.
  24. Li et al., X. 2003. Time-based language models. In CIKM.
  25. Mani et al., I. 2000. Robust temporal processing of news. In ACL.
  26. Norvag et al., K. 2006. Dyst: Dynamic and scalable temporal text indexing. ISTRR.
  27. Pasca et al., M. 2008. Towards temporal web search. In SAC.
  28. Preparata et al., F. P. 1985. Computational Geometry: An Introduction.
  29. Verhagen et al., M. 2005. Automating temporal annotation with tarsqi. In: Association for Computational Linguistics.
  30. Verhagen et al., M. 2009. Language and Linguistics Compass.
  31. Wong et al., K. 2005. An overview of temporal information extraction. Int. J. Comput. Proc. Oriental Lang..
  32. Zobel et al., J. 2006. Inverted les for text search engines. ACM Comput. Surv..