THE CASE FOR DATA VIRTUALIZATION

Maybe #datavaluehacking ?

Data virtualization (DV) has been around for a while. Ever since Gartner proposed its Logical Data Warehouse (LDW), it has become popular again. What is the LDW and DV?


The Logical Data Warehouse


LDW is an architecture that proposes an ecosystem of collaborating analytical systems. Ideally, the LDW should "look like" one database by exposing standard interfaces. It includes not just very processed data (data warehouses, data marts) but also one or more data lakes, structured or semi-structured data in this context.


There are three patterns of integration:


  1. The physical repository

  2. Data virtualization

  3. Distributed processing


Ten years ago, they often appeared as separate systems, due to various challenges in making them work together seamlessly. But it is now standard practice to combine the patterns into one common system. Gartner calls it the Modern Logical Data Warehouse.

Integration patterns in the Logical Data Warehouse

The benefits:


  1. Many requirements served via one environment

  2. Data virtualization providing an aggregate view of all data stores, and a common front end


It is important to remember that the original purpose of the data warehouse was to enable shared access to data. Gartner says that:

The original aims of data warehousing remain valid. These were to provide a broad and historically deep, shared view of everything that is going on within the organization. It did this by integrating and keeping the data generated by the organization's business processes. What changed was the variety of data available, types of analysis possible and the requirements. A unified view is still desirable, but a single physical processing engine cannot provide it. The integration needs to be logical, not physical.

Data Virtualization

What is Data Virtualization?
“Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users.” Posted on February 2, 2018 by James Serra

In the past, agility in data warehouse systems came at the expense of profligate use of data marts and other poorly integrated data stores. By providing an abstraction layer, a common interface and a rich alternative to ETL, DV can help in three ways:


  1. Regard the entire processing complex as a single system 

  2. Enable shared data access 

  3. Manage changes more easily by decoupling data sources from consumers


So the following diagram shows what it might look like:

The Data Virtualization Architecture

Technologies to look at


Denono is one of the most mature platforms for virtualization. To find out more, visit their website; there are a lot of resources that will help you see the advantages of integrating data virtualization in your analytics enablement strategy.

Denodo's platform - rich and stable

Also, everyone knows my view of Snowflake. Denodo takes special, smart advantage of Snowflake to offer truly performant queries when dealing with large data. Here is a link to the details.



©2020 by Modern Data Analytics