摘要

With the rapid increasing of XML documents on the Web,how to index,store and retrieve these documents has become a very popular and valuable problem.At present,there are two normal ways of retrieving XML documents.One is structure-based retrievaI,such as XPath and XQuery;the other is keyword-based retrieval.In the aspect of keyword-based XML retrieval,a majority of systems and algorithms are built based on one-layer index.However,one-layer index has two disadvantages:firstly,it may cause redundancy;secondly,it is not easy to be updated.In this paper,a new XML index model called two-layer index model is proposed,which considers both the advantages of traditional inverted index and dewey-code based inverted index.Moreover,a new stack algorithm based on two-layer index is proposed in order to rapidly get SLCA results from XML document sets.At last,the results of building two-layer index on Wiki document set and applying the stack algorithm to get SLCA results from two-layer index are presented,which show the efficiency of the proposed index model and algorithm.

  • 单位
    北京; 北京大学

全文