TGV & XLive
This works is a resume of my works during my thesis .
Integrating efficiently data from the many and heterogeneous data sources on the Internet is a crucial need for enterprises for their information systems.
To integrate data coming from distributed and heterogeneous sources under a single query interface, the famous mediators/wrappers architecture  is generally used. In this architecture, the user issues a request to the mediator which sends in turn part of the request to wrappers associated to data sources. the result is then sent back to the mediator which integrates the result accordingly.
XML is a popular language for representing business data and for exchanging information between business partners and applications. XQuery is a rich and complex language for querying XML, that can produce very precise, fine-grained result joining databases powerful expression and document search functionnalities.
In order to guarantee interoperability between the different elements of a mediator/wrappers architecture, it is now obvious to consider XML for data representation and XQuery as the query language.
But from the XQuery queried by the user to the XML result provided by the mediation system, a complex evaluation process has to be completed. Several issues emerge as well from the point of view of the distributed environment as of the XQuery evaluation. First point has been widely studied in lot of papers and project [,,] using various query languages. XQuery  has proved to be an expressive and powerful query language to query XML data both on structure and content, and to make transformation on the data. In addition, its query functionalities come from both the database community (filtering, join, selection, aggregation), and the text community (supporting and defining function as text search). However, the complexity of the XQuery language makes its evaluation very difficult. To alleviate this problem, most of the systems support only a limited subset of the XQuery language.
The XQuery expressiveness makes difficulties to obtain an exclusive internal representation within a system. To this purpose, models based on Tree Patterns have been proposed: in particular TPQ , generalized by the GTP . However, GTPs do not capture well all the expressivity of XQuery, cannot handle mediation problems, and do not support extensible optimisation.
Another challenge is to provide an extensible optimisation framework for efficient query evaluation. Search strategies for optimizing requests have been studied in Exodus , Starburst , Volcano  and OPT++ . The key idea of an extensible optimizer is to generate a query optimizer from rules for transforming plans into alternative plans. Selecting and ordering such rules for a query execution often rely on cost information as introduced with expected cost factor in  and . However, all these works apply only on relational or object context. As far as we know, nothing on rule-based optimizer has been done in semi-structured context on tree pattern matching queries.
Finally, both for previous optimization consideration and to consider sources specificities, it is important to take into account any piece of information that can influence the evaluation processing. Such information can be cost models, constraints on data sources (security, availability, preferences), limited functional capabilities , flexible results , etc.
We present a tree pattern-based model called TGV that
- integrates the whole functionalities of XQuery
- uses an intuitive representation that provides a global visualization of the request in a mediation context
- provides a framework for extensible optimization using a rules definition model
- takes into account all knowledges useful for the query evaluation (cost model, accuracy, etc.)
Evaluation XQuery in a Distributed Heterogeneous Environment
XLive Mediator System
The XLive prototype is designed to be a light mediation system with high modularity and extension capabilities. It is a running research vehicle designed for assessing the integration system at every stage of the process starting from sources extraction to the user interface, including query parsing and modeling, optimization and evaluation, and also benchmarking.
As most mediation systems, XLive is composed of three layers: Presentation, Integration and Sources.
On the Source Layer, there are multiple heterogeneous data sources (relational/object/XML datasources, webservices, files, etc.) These sources are queryable by the XLive system via wrappers. The Wrapper is a component for accessing a specific Source for querying and retrieving result. As sources have specific access methods, the role of the wrapper is to translate the wrapper specificity to a common access method.
The Integration Layer is the heart of the mediation system as it process XQuery requests according to sources and send by the result in XML form. The Wrapper Information Manager is for integrating information about wrapped sources. It provides to the mediator sources metadata, capabilities, and costs statistics on source data. The XQuery Evaluator parses the XQuery query, make logical plans then physical plans by using optimization rules, choose the best execution plan, and then evaluates the execution plan by querying relevant sources and merging results The XLive system is provide a public API that process an XQuery query and evaluates it on sources, and then return the result as an XML document.
Some clients using the XLive public API have been implemented on the Presentation Layer: The Console is a graphical interface for managing and querying XLive. The Web service provides services for managing/querying XLive from the Web. And finally, the Benchmark is designed to test the XLive mediator and other sources in order to make comparisons.
The rest of the paper will concentrate exclusively on the XQuery evaluation process. More details on the XLive architecture can be found on [1,3].
XQuery Evaluation Process
XQuery is a rich and complex language. Its powerful expression capabilities provide a large range of queries over XML documents. However the richness of the language makes the XQuery processing very difficult so it is necessary to reduce the XQuery domain to study (but not restrict the XQuery domain that must be recognized and evaluated !). For that purpose, we use query equivalent syntax using some transformation rules. These transformation rules keep the semantic of queries and make them more convenient to manipulate. Set of equivalents queries are then reduced to one unique canonical query.
XQuery canonization has been introduced by  and we have extended canonization rules in order to support the whole XQuery semantic (except types support). Canonization rules and proof are described in .
Figure describes the evaluation process: (1) XQuery is canonized into a canonical form of XQuery (2) then the canonized XQuery is modeled in the internal structure TGV which can (3) be restructured into equivalent structures using equivalence rules. (4) Then the TGV is annotated with information for evaluation such as the data sources location, cost models information, sources functional capabilities, etc. The optimal annotated TGV is then selected and (5) the logical TGV is transformed into an execution plan using a physical algebra. We have chosen the XAlgebra , an extension of the relational algebra to XML. (6) Finally, the execution plan is evaluated and produces an XML result.
The whole process is implemented in the XLive  system and validate all use-cases defined by the W3C that do not implies strong typing consideration (our system recognize 8 of the 9 categories of XQuery).
[tplist include=”57,56,59,53″ headline=”0″ style=”numbered”]