Skip to main content
Blog

Data Vault 2.0 – Basics and Beyond

By September 11, 2024No Comments
Astraa Data Vault 2.0

In today’s data driven world, making informed decisions hinges on a solid data management strategy. Data Vault 2.0 has swiftly gained popularity in this area. Enterprises are sold to the concept of Data Vault being the more scalable, flexible, agile way of data warehousing over the traditionally followed Kimbell approach. In this article we’ll take a look at a few details and nuances that can help stakeholders make the right choices for their transformation journey.

How’s Vault all the things it claims to be

According to Dan Lindstedt:

“The Data Vault is a detail oriented, historical tracking, and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise.”

Raw Vault achieves the flexibility, scalability and a fast turnaround through an array of objects designed specifically to address traditional challenges. Let’s look at them in a brief

Astraa_Data_Vault_2-1-1

These objects run on common principles of immutability, history maintenance and complete traceability. Apart from these objects maintaining the time of arrival and source for each entry, there are a few more DV 2.0 components that help with the adherence and enforcement of these principles-

SATs – 

  • Multi-active Satellites – To accommodate multiple active states, against an entity
  • Reference Satellites – To hold reference/master data
  • Effectivity Satellites – To track the effectivity of a relationship
  • Status Tracking Satellites – To track the status of and entity
  • Record Tracking Satellites – To track the status of a record
LINKs – 

  • Hierarchical Links – To accommodate parent – child relationship
  • Same-As Links – To outline entity keys that look different but represent the same entity
  • Non-Historized Links – To be used for transactional relationships that do not require historical data
  • Reference Links – To outline relationship of an entity key to some reference/master data

At the end of it, here’s how Raw Vault model for a typical warehousing solution may look like:

Astraa_Data_Vault_2-3-1

Figure – A simple DV 2.0 Raw Vault model for the e-commerce businesses

Key takeaways

Looking at the tools that Raw Vault brings with it, we can easily draw out following benefits –

Astraa_Data_Vault_2-4-1

Fully equipped to tackle business changes – Be it a change in the business keys or entity relationships, the isolation of the keys and relationships from the actual data around them makes such integrations hassle free

Handles large data volume and achieves parallelism – One satellite per source enables parallel processing which inturn means load distribution and faster load cycles. Satellites can also be distributed on the basis of change velocity that results in reduced data redundancy.

Incremental data onboarding – Onboarding new sources and data points made easier. Data Vault’s design advocates pattern driven development that enables a quick time to business for new requirements.

Fail early and rectify fast – With Data Vault, SMEs can identify problems in the source data/model much before reaching the Data Marts.

Beyond Raw Vault

So far, we’ve spoken about the advantages of implementing the Raw Vault. But, does that mean it doesn’t bring challenges of its own?

As a matter of fact, it does. A quick look at the model above and stakeholders often find themselves asking questions like “How do we proceed from here?”, “What about the performance coming out of Raw Vault?”, “Is there a standard for pulling data out of the Raw Vault?”, “Shouldn’t there be a Business Vault, and what does that entail?”

These are all valid queries, especially when it comes to constructing a robust and scalable architecture for the organization’s data management strategy. Some of the answers can be found in Business Vault. We’ll take a quick look at how it helps the data flow from Raw Vault to the Marts and if it is even mandatory for a fully compliant Vault implementation.

The business vault is an extension of a raw vault that applies selected business rules, de-normalizations, calculations & other query-assistance functions. Business vault too comes with a few standard data extraction approaches as detailed below –

PIT tables – PIT tables specifically address the need for denormalization. Denormalization doesn’t just mean stitching together data that’s maintained independently in the source systems, but also getting back data points that were segregated into multiple objects by the virtue of Raw Vault. For example, in the investment sector, one may want to look at not just the basic details around an instrument that they have invested into but also the coupon attached to it, the factors that get updated every 10th of the month and/or the ratings provided from various Credit Rating Agencies.
On the other hand, denormalization between the Raw Vault objects for daily positions data, that pivots on the relationship between a portfolio and an instrument is inevitable.
A PIT table can help tie the data together for a better 360 view of the entities as well as tie the relationship stored in links back to the driving datasets and ready for consumption in the Data Marts.

Bridge tables – The concept of bridge tables has been there for a while and just like any typical scenario, these will be created to optimize joins across datasets that are expected to be used together frequently.

Satellites with predefined calculations – Quite often, there are some transformations / calculations / aggregations that are a common requirement across multiple user groups and functions. Business vault would be the recommended place to achieve these transformations which makes it available to various data marts.

All this being said, is Business Vault a mandatory hop? The answer is no. One can be fully compliant to DV 2.0 and still not have a business vault implemented. We can see that the requirement for a business vault pivots around the complex scenarios that may pop up during the warehouse implementation.

Conclusion

To conclude, let us take a moment and look at the factors of Data Vault that may impact the decision making process.
Data Vault offers flexibility, scalability, governance and agility. All this, topped with a pattern driven approach for Raw Vault development, inturn results in a high speed delivery. There are frameworks and tools that can fuel the velocity even further (Check out more on Astraa’s MDD framework success story here).

However, we have also seen how data extraction from Raw Vault can be a complex task in itself and may require stakeholders to factor in the implementation of Business Vault during their time and cost estimation. Not to forget, an increased number of objects to be maintained and monitored is an obvious byproduct.

These are all very important points to be considered before choosing Data Vault as the warehousing approach. Considering the size of the enterprises is a critical factor too. A small enterprise would be recommended against going through the complexity of a Vault implementation if they have fairly lesser functions/line of businesses.

Hope this helps in getting a fair understanding of Data Vault 2.0 and help gauge the impact of it’s implementation

Tanya Tewary

Champions the role of Principal Engineer and delivery lead in the Asset Management Solutions space. Has contributed to Digital Transformation journeys at the capacity of a solutionist as well as a techno-functional collaborator. Proven history in elevating Astraa’s accelerators and capabilities to resolve transformation challenges for customers.

Leave a Reply