Building Proximity Models for Cross Language Information Retrieval

Lam Tung Giang; Vo Trung Hung; Huynh Cong Phap

doi:10.31130/jst.2015.5

Lam Tung Giang Office of Danang People’s Com-mittee, Vietnam
Vo Trung Hung The University of Danang, Vietnam
Huynh Cong Phap College of Infor-mation Technology (CIT) - The University of Danang, Vietnam (UD)

DOI: https://doi.org/10.31130/jst.2015.5

Abstract

In information retrieval systems, the proximity of query terms has been employed to enable ranking models to go beyond the ”bag of words” assumption and it can promote scores of documents where the matched query terms are close to each other. In this article, we study the integration of proximity models into cross-language information retrieval systems. The new proximity models are proposed and incorporated into existing cross-language information systems by combining the proximity score and the original score to re-rank retrieved documents. The experiment results show that the proposed models can help to improve the retrieval performance by 4%-7%, in terms of Mean Average Precision.

Downloads

Download data is not yet available.

References

[1] K M Svore, P H Kanani, and N Khan. How Good is a Span of Terms? Exploiting Proximity to Improve Web Retrieval. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 154–161, 2010.
[2] Yves Rasolofo and Jacques Savoy. Term Proximity Scoring for Keyword-Based Retrieval Systems. Lecture Notes in Computer Science, 2633:207–218, 2003.
[3] Ralf Schenkel, Andreas Broschart, Seungwon Hwang, Martin Theobald, and Gerhard Weikum. Efficient Text Proximity Search. String Processing and Information Retrieval, pages 287–299, 2007.
[4] Mirna Adriani. Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval. 80:69–80, 2000.
[5] Fatiha Sadat. Research on Query Disambiguation and Expansion for Cross-Language Information Retrieval. Communications of the IBIMA, 2010.
[6] Lam Tung Giang, Vo Trung Hung, and Huynh Cong Phap. Building Structured Query in Target Language for Vietnamese English Cross Language Information Retrieval Systems. International Journal of Engineering Research & Technology, 4(04):146–151, 2015.
[7] S.E. Robertson and Karen Sp¨arck Jones. Relevance Weighting of Search Terms. Journal of the American Society for Information Science, (June):129–146, 1976.
[8] David Hawking and Paul Thistlewaite. Proximity Operators - So Near And Yet So Far. In Proceedings of TREC-4, volume 1000, pages 295–304, 1995.
[9] Charles L a Clarke, Gordon V Cormack, and Forbes J Burkowski. Shortest Substring Ranking. In Proceedings of TREC-4, pages 1–10.
[10] Stefan B¨ uttcher and Charles Clarke. Efficiency vs. effectiveness in terabyte-scale information retrieval. Proceedings of the 14th Text REtrieval, 2005.
[11] Stefan B¨ uttcher, Charles L. a. Clarke, and Brad Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’06, page 621, 2006.
[12] Tao Tao and ChengXiang Zhai. An Exploration of Proximity Measures in Information Retrieval. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 07, pages 295–302, 2007.
[13] Jinglei Zhao, Jinglei Zhao, Yeogirl Yun, and Yeogirl Yun. A Proximity Language Model for Information Retrieval. Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’09, page 291, 2009.
[14] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. New York. Addison Wesley, 2 edition, 1999.