Answer Quality Prediction in Q/A Social Networks by Leveraging Temporal Features

##plugins.themes.academic_pro.article.main##

Yuanzhe Cai
Sharma Chakravarthy

Abstract

Community Question Answering (or CQA) services (also known as Q/A social networks) have become widespread in the last several years. It is seen as a potential alternative to search as using Q/A services avoids sifting through a large number of (ranked) search results, returned by a typical search engine, to get at the desired information. Currently, \emph{best} answers in CQA services are determined either manually or through a voting process. Many CQA services calculate activity levels for users to approximate the notion of expertise. As large numbers of CQA services are becoming available, it is important and challenging to predict \emph{best} answers (not necessarily answers by an expert) using machine learning techniques. Previous approaches, typically, extract a set of features (primarily textual and non-textual) from the data set and use them in a classification system to determine the \emph{best} answer.This paper posits that temporal features, different from the ones proposed and used in the literature, are better-suited for Q/A data sets and can be quite effective for predicting the quality of answers. The suitability of temporal features is based on the observation that these services are dynamic in nature in terms of the number of users participating in a given period and how many questions they choose to answer over an interval. We propose and analyze a small set of temporal features, and demonstrate that a few of these features work better than the large number of features used in the literature using the same traditional classification techniques. We also argue that the classification approaches measuring precision and recall are not well-suited as the CQA data is unbalanced, and quality ranking of \emph{all} answers need to be predicted. We propose the use of learning to rank approaches, and show that the features identified in this paper work very well with this approach as well. We use multiple, diverse data sets to establish the utility and effectiveness of features identified for predicting the quality of answers. This approach allows us to qualitatively predict the best answer as well as rank \emph{all} answers. The long-term goal is to build a framework for identifying experts, at different levels of granularity such as global and concept-specific, for CQA services.

##plugins.themes.academic_pro.article.details##

How to Cite
Yuanzhe Cai, & Sharma Chakravarthy. (2013). Answer Quality Prediction in Q/A Social Networks by Leveraging Temporal Features. International Journal of Next-Generation Computing, 4(1), 01–27. https://doi.org/10.47164/ijngc.v4i1.42

References

  1. Bian, J., Liu, Y., Agichtein, E., and Zha, H. 2008. Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media. In WWW. ACM, Madrid, Spain, 467–476.
  2. Brin, S. and Page, L. 1998. The Anatomy of a Large-scale HypertextualWeb Search Engine. Computer Networks and ISDN Systems 30, 1-7, 107–117.
  3. Campbell, C. S., Maglio, P. P., Cozzi, A., and Dom, B. 2003. Expertise Identification Using Email Communications. In CIKM. ACM, New Orleans, Louisiana, USA, 528–531.
  4. Cho, J. and Adams, R. E. 2005. Page Quality: In Search of an Unbiased Web Ranking. In SIGMOD. ACM, Baltimore, Maryland, USA, 551–562.
  5. Dom, B., Eiron, I., Cozzi, A., and Zhang, Y. 2003. Graph-based Ranking Algorithms for E-mail Expertise Analysis. In SIGMOD Workshop. ACM, San Diego, California, USA, 42–48.
  6. Everitt, B. S. and Skrondal, A. 2006. The Cambridge Dictionary of Statistics (Second Edition). Vol. 4. Cambridge University Press Cambridge.
  7. Harper, F. M., Raban, D., Rafaeli, S., and Konstan, J. A. 2008. Predictors of Answer Quality in Online Q&A Sites. In SIGCHI. ACM, Florence, Italy, 865–874.
  8. Jeon, J., Croft, W. B., Lee, J. H., and Park, S. 2006. A Framework to Predict the Quality of Answers with Non-textual Features. In SIGIR. ACM, Seattle, Washington, USA, 228–235.
  9. Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. In SIGKDD. ACM, Edmonton, Alberta, Canada, 133–142.
  10. Jurczyk, P. and Agichtein, E. 2007. Discovering Authorities in Question Answer Communities by Using Link Analysis. In CIKM. ACM, Lisboa, Portugal, 919–922.
  11. Kendall, M. G. 1938. A New Measure of Rank Correlation. Biometrika 30, 1/2, 81–93.
  12. Kleinberg, J. M. 1999. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46, 5, 604–632.
  13. Littlepage, G. E. and Mueller, A. L. 1997. Recognition and Utilization of Expertise in Problem-solving Groups: Expert Characteristics and Behavior. Group Dynamics: Theory, Research, and Practice 1, 4, 324–328.
  14. Liu, Y., Bian, J., and Agichtein, E. 2008. Predicting Information Seeker Satisfaction in Community Question Answering. In SIGIR. ACM, Singapore, 483–490.
  15. Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Tech. rep., University of Stanford, California, USA. http://ilpubs.stanford.edu:8090/422/1/ 1999-66.pdf.
  16. Radev, D. R., Qi, H., Wu, H., and Fan, W. 2002. Evaluating Web-based Question Answering Systems. In LREC. Vol. 1001. European Language Resources Association, Las Palmas, Canary Islands, Spain, 109–112.
  17. Salton, G. M., Wong, A., and Yang, C. 1975. A Vector Space Model for Automatic Indexing. Communications of the ACM 18, 11, 613–620.
  18. Shah, C. and Pomerantz, J. 2010. Evaluating and Predicting Answer Quality in Community QA. In SIGIR. ACM, Geneva, Switzerland, 411–418.
  19. Strong, D. M., Lee, Y. W., andWang, R. Y. 1997. Data Quality in Context. Communications of the ACM 40, 5, 103–110.
  20. Surdeanu, M., Ciaramita, M., and Zaragoza, H. 2008. Learning to Rank Answers on Large Online QA Collections. In ACL. The Association for Computer Linguistics, Columbus, Ohio, USA, 719–727.
  21. Zhang, J., Ackerman, M. S., and Adamic, L. 2007. Expertise Networks in Online Communities: Structure and Algorithms. In WWW. ACM, Banff, Alberta, Canada, 221–230.
  22. Zhu, X. and Gauch, S. 2000. Incorporating Quality Metrics in Centralized/distributed Information Retrieval on the World Wide Web. In SIGIR. ACM, Athens, Greece, 288–295.