Searching Complex Data Without an Index
##plugins.themes.academic_pro.article.main##
Abstract
We show how query-specific content-based computation can be used for interactive search when a pre-computed index is not available. Rather than text or numeric data, we focus on complex data such as digital photographs and medical images. We describe a system that can perform such interactive searches on stored data as well as live Web data. The system is able to narrow the focus of a non-indexed search by using structured data sources such as relational databases. It can also leverage domain-specific software tools in search computations. We report on the design and implementation of this system, and its use in the health sciences.
##plugins.themes.academic_pro.article.details##
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
Mahadev Satyanarayanan, Rahul Sukthankar, Adam Goode, Nilton Bila, Lily Mummert, Jan Harkes, Adam Wolbach, Larry Huston, & Eyal de Lara. (2010). Searching Complex Data Without an Index. International Journal of Next-Generation Computing, 1(2), 146–167. https://doi.org/10.47164/ijngc.v1i2.17
References
- Flickr. http://www.flickr.com. Hadoop. http://hadoop.apache.org/core/. SQLite. http://www.sqlite.org/.
- ACHARYA, A., UYSAL, M., AND SALTZ, J. 1998 Active Disks: Programming Model, Algorithms and Evaluation. In Proceedings of the International Conference on Architectural Support for Programming Langugages and Operating Systems (1998).
- AMIRI, K., PETROU, D., GANGER, G., AND GIBSON, G. Dynamic Function Placement for Data-Intensive Cluster Computing. In Proceedings of the USENIX Technical Conference (2000).
- ARPACI-DUSSEAU, R., ANDERSON, E., TREUHAFT, N., CULLER, D., HELLERSTEIN, J., PATTERSON, D., AND YELICK, K.
- Cluster I/O with River: Making the Fast Case Common. In Proceedings of Input/Output for Parallel and Distributed Systems (1999).
- AVNUR, R., AND HELLERSTEIN, J. Eddies: Continuously Adaptive Query Processing. In Proceedings of SIGMOD (2000).
- BERCHTOLD, S., BOEHM, C., KEIM, D., AND KRIEGEL, H. A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space. In Proceedings of the Symposium on Principles of Database Systems (Tucson, AZ, May 1997).
- DEAN, J., AND GHEMAWAT, S. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (San Francisco, CA, 2004).
- DUDA, R., HART, P., AND STORK, D. Pattern Classification. Wiley, 2001.
- FLICKNER, M, SAWHNEY, H, NIBLACK, W, ASHLEY, J, HUANG, Q, DOM, B, GORKANI, M, HAFNER J, LEE D, PETKOVIC D, STEELE D, AND YANKER P. Query by Image and Video Content: The QBIC System. IEEE Computer 28, 9 (September 1995).
- GOODE, A., SUKTHANKAR, R., MUMMERT, L., CHEN, M., SALTZMAN, J., ROSS, D., SZYMANSKI, S., TARACHANDANI, A., AND SATYANARAYANAN, M. Distributed Online Anomaly Detection in High-Content Screening. In Proceedings of the 2008 5th IEEE International Symposium on Biomedical Imaging (Paris, France, May 2008).
- GOODE, A., CHEN, M., TARACHANDANI, A., MUMMERT, L., SUKTHANKAR, R., HELFRICH, C., STEFANNI, A., FIX, L., SALTZMANN, J., AND SATYANARAYANAN, M. Interactive Search of Adipocytes in Large Collections of Digital Cellular Images.In Proceedings of the 2007 IEEE International Conference on Multimedia and Expo (ICME07) (Beijing, China, July 2007).
- HUNT, G., AND SCOTT, M. The Coign Automatic Distributed Partitioning System. In Proceedings of OSDI (1999).
- HUSTON, L., SUKTHANKAR, R., HOIEM, D., AND ZHANG, J. SnapFind: Brute force interactive image retrieval. In Proceedings of International Conference on Image Processing and Graphics (2004).
- HUSTON, L., SUKTHANKAR, R., WICKREMESINGHE, R., SATYANARAYANAN, M., GANGER, G.R., RIEDEL, E., AND AILAMAKI, A. Diamond: A Storage Architecture for Early Discard in Interactive Search. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (San Francisco, CA, April 2004).
- KE, Y., SUKTHANKAR, R., AND HUSTON, L. Efficient near-duplicate and sub-image retrieval. In Proceedings of ACM Multimedia (2004).
- KEETON, K., PATTERSON, D., AND HELLERSTEIN, J. A case for intelligent disks (IDISKs). SIGMOD Record 27, 3 (1998).
- KIM, E., HASEYAMA, M., AND KITAJIMA, H. Fast and Robust Ellipse Extraction from Complicated Images. In Proceedings of IEEE Information Technology and Applications (2002).
- LOWE, D. Distinctive Image Features from Scale-Invariant Keypoints. International Journal on Computer Vision (2004).
- MEMIK, G., KANDEMIR, M., AND CHOUDHARY, A. Design and Evaluation of Smart Disk Architecture for DSS Commercial Workloads. In Proceedings of the International Conference on Parallel Processing (2000).
- MINKA, T., AND PICARD, R. Interactive Learning Using a Society of Models. Pattern Recognition 30 (1997).
- RIEDEL, E., GIBSON, G., AND FALOUTSOS, C. Active Storage for Large-Scale Data Mining and Multimedia. In Proceedings of the International Conference on Very Large Databases (August 1998).
- SELINGER, P., ASTRAHAN, M., CHAMBERLIN, D., LORIE, R., AND PRICE, T. Access path selection in a relational database management system. In Proceedings of SIGMOD (1979).
- VON AHN, L., AND DABBISH, L. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vienna, Austria, April 2004).
- YANG, L., JIN, R., MUMMERT, L., SUKTHANKAR, R., GOODE, A., ZHENG, B., HOI, S. C., AND SATYANARAYANAN, M. A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1 (January 2010).
- YAO, A., AND YAO, F. A General Approach to D-Dimensional Geometric Queries. In Proceedings of the Annual ACM Symposium on Theory of Computing (May 1985).