Searching Complex Data Without an Index

##plugins.themes.academic_pro.article.main##

Mahadev Satyanarayanan
Rahul Sukthankar
Adam Goode
Nilton Bila
Lily Mummert
Jan Harkes
Adam Wolbach
Larry Huston
Eyal de Lara

Abstract

We show how query-specific content-based computation can be used for interactive search when a pre-computed index is not available. Rather than text or numeric data, we focus on complex data such as digital photographs and medical images. We describe a system that can perform such interactive searches on stored data as well as live Web data. The system is able to narrow the focus of a non-indexed search by using structured data sources such as relational databases. It can also leverage domain-specific software tools in search computations. We report on the design and implementation of this system, and its use in the health sciences.

##plugins.themes.academic_pro.article.details##

How to Cite
Mahadev Satyanarayanan, Rahul Sukthankar, Adam Goode, Nilton Bila, Lily Mummert, Jan Harkes, Adam Wolbach, Larry Huston, & Eyal de Lara. (2010). Searching Complex Data Without an Index. International Journal of Next-Generation Computing, 1(2), 146–167. https://doi.org/10.47164/ijngc.v1i2.17

References

  1. Flickr. http://www.flickr.com. Hadoop. http://hadoop.apache.org/core/. SQLite. http://www.sqlite.org/.
  2. ACHARYA, A., UYSAL, M., AND SALTZ, J. 1998 Active Disks: Programming Model, Algorithms and Evaluation. In Proceedings of the International Conference on Architectural Support for Programming Langugages and Operating Systems (1998).
  3. AMIRI, K., PETROU, D., GANGER, G., AND GIBSON, G. Dynamic Function Placement for Data-Intensive Cluster Computing. In Proceedings of the USENIX Technical Conference (2000).
  4. ARPACI-DUSSEAU, R., ANDERSON, E., TREUHAFT, N., CULLER, D., HELLERSTEIN, J., PATTERSON, D., AND YELICK, K.
  5. Cluster I/O with River: Making the Fast Case Common. In Proceedings of Input/Output for Parallel and Distributed Systems (1999).
  6. AVNUR, R., AND HELLERSTEIN, J. Eddies: Continuously Adaptive Query Processing. In Proceedings of SIGMOD (2000).
  7. BERCHTOLD, S., BOEHM, C., KEIM, D., AND KRIEGEL, H. A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space. In Proceedings of the Symposium on Principles of Database Systems (Tucson, AZ, May 1997).
  8. DEAN, J., AND GHEMAWAT, S. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (San Francisco, CA, 2004).
  9. DUDA, R., HART, P., AND STORK, D. Pattern Classification. Wiley, 2001.
  10. FLICKNER, M, SAWHNEY, H, NIBLACK, W, ASHLEY, J, HUANG, Q, DOM, B, GORKANI, M, HAFNER J, LEE D, PETKOVIC D, STEELE D, AND YANKER P. Query by Image and Video Content: The QBIC System. IEEE Computer 28, 9 (September 1995).
  11. GOODE, A., SUKTHANKAR, R., MUMMERT, L., CHEN, M., SALTZMAN, J., ROSS, D., SZYMANSKI, S., TARACHANDANI, A., AND SATYANARAYANAN, M. Distributed Online Anomaly Detection in High-Content Screening. In Proceedings of the 2008 5th IEEE International Symposium on Biomedical Imaging (Paris, France, May 2008).
  12. GOODE, A., CHEN, M., TARACHANDANI, A., MUMMERT, L., SUKTHANKAR, R., HELFRICH, C., STEFANNI, A., FIX, L., SALTZMANN, J., AND SATYANARAYANAN, M. Interactive Search of Adipocytes in Large Collections of Digital Cellular Images.In Proceedings of the 2007 IEEE International Conference on Multimedia and Expo (ICME07) (Beijing, China, July 2007).
  13. HUNT, G., AND SCOTT, M. The Coign Automatic Distributed Partitioning System. In Proceedings of OSDI (1999).
  14. HUSTON, L., SUKTHANKAR, R., HOIEM, D., AND ZHANG, J. SnapFind: Brute force interactive image retrieval. In Proceedings of International Conference on Image Processing and Graphics (2004).
  15. HUSTON, L., SUKTHANKAR, R., WICKREMESINGHE, R., SATYANARAYANAN, M., GANGER, G.R., RIEDEL, E., AND AILAMAKI, A. Diamond: A Storage Architecture for Early Discard in Interactive Search. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (San Francisco, CA, April 2004).
  16. KE, Y., SUKTHANKAR, R., AND HUSTON, L. Efficient near-duplicate and sub-image retrieval. In Proceedings of ACM Multimedia (2004).
  17. KEETON, K., PATTERSON, D., AND HELLERSTEIN, J. A case for intelligent disks (IDISKs). SIGMOD Record 27, 3 (1998).
  18. KIM, E., HASEYAMA, M., AND KITAJIMA, H. Fast and Robust Ellipse Extraction from Complicated Images. In Proceedings of IEEE Information Technology and Applications (2002).
  19. LOWE, D. Distinctive Image Features from Scale-Invariant Keypoints. International Journal on Computer Vision (2004).
  20. MEMIK, G., KANDEMIR, M., AND CHOUDHARY, A. Design and Evaluation of Smart Disk Architecture for DSS Commercial Workloads. In Proceedings of the International Conference on Parallel Processing (2000).
  21. MINKA, T., AND PICARD, R. Interactive Learning Using a Society of Models. Pattern Recognition 30 (1997).
  22. RIEDEL, E., GIBSON, G., AND FALOUTSOS, C. Active Storage for Large-Scale Data Mining and Multimedia. In Proceedings of the International Conference on Very Large Databases (August 1998).
  23. SELINGER, P., ASTRAHAN, M., CHAMBERLIN, D., LORIE, R., AND PRICE, T. Access path selection in a relational database management system. In Proceedings of SIGMOD (1979).
  24. VON AHN, L., AND DABBISH, L. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vienna, Austria, April 2004).
  25. YANG, L., JIN, R., MUMMERT, L., SUKTHANKAR, R., GOODE, A., ZHENG, B., HOI, S. C., AND SATYANARAYANAN, M. A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1 (January 2010).
  26. YAO, A., AND YAO, F. A General Approach to D-Dimensional Geometric Queries. In Proceedings of the Annual ACM Symposium on Theory of Computing (May 1985).