|
Consider a web application where a user can query a variety of information about cities such as crime statistics, weather, hotels, demographics, etc. Traditionally, the information must exist in a single database with a single schema. Information of this breadth, however, is difficult and expensive for a single enterprise to collect. Even if the resources exist to gather the data, it would likely duplicate data in existing crime databases, weather websites, and census data.
A data integration solution addresses this problem by considering these external resources as materialized views over a virtual mediated schema. This means application developers construct a schema to best model the kinds of answers their users want. This virtual schema is called the mediated schema. Next, they design "wrappers" or adapters for each data source, such as the crime database and weather website. These adapters simply transform the local query results (those returned by the respective websites or databases) into an easily processed form for the data integration solution. When an application-user queries the mediated schema, the data integration solution transforms this query into appropriate queries over the respective data sources. Finally, the results of these queries are combined into the answer to the user's query.
A convenience of this solution is that new sources can be added by simply constructing an adapter for them. This contrasts with ETL systems or a single database solution where the entire new dataset must be manually integrated into the system.
|