Case Study

Building Trust in Data Migrations with DBT: The Power of audit_helper

During a major dbt refactoring, we faced challenges in ensuring data accuracy—and found a reliable solution in the audit_helper library.

Author—Ankit Devani

Overview

When migrating data to a new warehouse, optimizing dbt models, or cleaning up legacy transformations, even small changes can lead to unexpected discrepancies.

Traditional validation often depends on manual SQL, spreadsheets, or repetitive testing. These methods are slow, tedious, and prone to human error.

That’s where audit_helper comes in.

What Is audit_helper?

Developed by dbt Labs, audit_helper is an open-source dbt package designed to compare two datasets—typically your original and updated models—on a row-by-row and column-by-column basis.

Key features

Row-by-Row Comparison

Detects added, removed, modified, and unchanged records with precision.

Column-Level Discrepancy Detection

Provides insight into exactly which columns have changed, allowing for pinpointed debugging and deeper trust in the results.

Summary Statistics

High-level differences are clearly summarized, making it easy to assess impact quickly.

Dbt-Native Integration

Since it’s designed to work directly with dbt, implementation is simple and non-disruptive to existing workflows.

Benefits Realized & Measured Impact

60%

Faster UT and UAT Cycles

By automating the comparison process, we drastically reduced time spent on Unit Testing (UT) and User Acceptance Testing (UAT).

80%

Enhanced Trust in Data

The detailed discrepancy reports gave us confidence that no issues slipped through the cracks.

20%

Less manpower

Seamless integration into our dbt environment required no complicated setups or external tools.

How to Use audit_helper in Your Workflow

📦

Step 1: Install the Package

Add the package to your packages.yml and run dbt deps.

packages:
  - package: dbt-labs/audit_helper
    version: 0.9.0

🔗

Step 2: Set Up an Audit Model

Create a model in models/audit/ named audit_by_row.sql:

{{ audit_helper.compare_and_classify_relation_rows(
    a_relation=ref('original_model'),
    b_relation=ref('refactored_model'),
    primary_key_columns=['id']
) }}

▶️

Step 3: Run the Audit

You can also tag your audit models to isolate them from your main runs.

dbt run --select audit_by_row

A Word of Caution

It’s not designed for historical logging—each audit run replaces the previous result.
Best used in development or staging environments due to its table replacement behavior.

Quantifying the Impact

Traditional Approach

2–3 weeks of effort
Multiple analysts & engineers
Manual SQL for every model
High risk of human error

With our Solution

Only 3 days of effort
Single engineer handled it
Automation replaced manual checks
90% time reduction
Saved 120+ hours ⏱️

In one of our recent projects, we refactored 200+ dbt models with complex logic. What previously required a team and weeks of effort, now took just 3 days with audit_helper.

Conclusion

By automating data validation with audit_helper, we streamlined complex model checks into a fast, reliable process—catching issues early and eliminating manual guesswork.

This approach not only delivered unmatched speed and accuracy, but also built trust into every row—giving our team full transparency into model changes and the assurance that nothing broke along the way.

When clients know their data is validated and production-ready, it transforms deliverables from “good enough” to confidently dependable—elevating both the work and the partnership.

Empower Your Business With Cutting-edge IT Solutions

Unlock Innovation and Growth with Our Expert Solutions

Let's Talk