Dataset Lineage

Lineage Graphs provide a visual representation of dataset relationships across NinjaCat. They show how datasets, transformations, and views are connected — making it easier to trace dependencies, investigate issues, and assess downstream impact before making changes.

Why use Lineage Graphs

Lineage Graphs improve visibility into data dependencies and reduce the time required to troubleshoot broken or unexpected dataset behavior. Instead of manually tracing relationships across multiple assets, you can review a single visual graph showing what feeds into a dataset and what depends on it.

Common workflows where lineage helps:

Troubleshooting broken datasets
Understanding how source changes affect downstream assets
Identifying dependencies before deleting or modifying a dataset
Building confidence in data quality and trust

See lineage at a glance

Lineage information is surfaced in several places throughout the platform so you can monitor dependency health without leaving your current workflow:

Lineage tab on every dataset — Open the Lineage tab on a dataset to see an interactive upstream/downstream graph of every dataset it depends on and every dataset that depends on it.
Lineage health column — Dataset and connector lists include a lineage health column so you can spot issues across your data at a glance.
Unhealthy source warnings on transformation pages — When you build a transformation, NinjaCat flags any unhealthy source datasets directly where you're working.

Errors and warnings on the Dataset Details page

Errors and warnings for a dataset are surfaced directly at the top of the Dataset Details page — above the Sync Details, Data Checks, and Lineage tabs — so you can see the dataset's current health without opening the lineage graph.

⚠️
A warning is not an error
Warnings are informational. They flag conditions you may want to be aware of — most commonly that a dataset is not account matched or not date matched. A warning does not mean the dataset is broken, and it does not always require action.
If account matching or date matching isn't relevant to how you're using the dataset, you can safely ignore the warning. For example, if your workflow doesn't depend on account-level joins, a "not account matched" warning is just a notice — no fix needed.
Errors, by contrast, indicate a broken or invalid dependency and typically need to be resolved before the dataset can be used reliably downstream.

Warnings and errors reflect the state of your data as of the last time it was received — not your current configuration. If you've recently updated a mapping or setting, trigger a sync to refresh the messages.

The same conditions can also be inspected per-node in the lineage graph — see Inspect errors and warnings below.

Working with the lineage graph

Dependency visualization

The lineage graph displays relationships between datasets, transformations, and views. For any selected dataset, you can see both upstream source dependencies and downstream usages in a single interface.

Filter by sources or usages

You can filter the graph to focus on:

Sources — assets that feed into the selected dataset
Usages — assets that depend on the selected dataset

This lets you narrow the view based on the troubleshooting or analysis task at hand.

Inspect errors and warnings

Each node in the graph can expose validation details, including warnings and errors. You can inspect problematic nodes directly from the graph to identify issues affecting dataset integrity or downstream outputs.

Warnings vs. errors
Warnings are informational and may highlight potential issues such as account matching or date column concerns. A warning does not necessarily mean a dataset is unusable. Errors indicate broken or invalid dependencies that likely need attention.

Filter to errors only

In more complex graphs with many connected assets, you can isolate elements with errors to focus your investigation on broken or problematic dependencies.

Navigate to related assets

From the lineage graph, you can navigate directly to related datasets or views for additional review and remediation.

How lineage stays up to date

Lineage is event-driven. In nearly all cases the graph refreshes itself automatically — you shouldn't need to think about it.

Lineage rebuilds automatically when:

New data is finalized for a dataset — after a sync completes and data is finalized, the affected dataset's upstream chain is reconciled.
A transformation is saved — when a transformation is created or updated, both its upstream sources and its downstream consumers are reconciled. This runs on a short delay (~10 seconds) to allow the underlying view to finish building.

Because these events cover the vast majority of changes, the graph you see is almost always current.

Refresh Lineage

The Refresh Lineage button rebuilds your dataset's lineage graph — the nodes and connections shown in the diagram — using the most current information from the systems where your data and transformations live.

When something looks off — or when a dataset has no lineage yet, or after you've resolved an issue and want to confirm the latest dependency state — use Refresh Lineage to refresh it.

You can trigger Refresh Lineage from:

The dataset detail page (More Actions menu, and the Lineage tab button)
The dataset list row menu
A connector's datasets table

What it does

Refresh Lineage re-walks the dataset's upstream and downstream dependency chain, revalidates every connected node against the source-of-truth services, and refreshes the lineage health status everywhere it's displayed (the Lineage tab, dataset and connector lists, and transformation source warnings).

When to use it manually

In day-to-day use you rarely need to click Refresh Lineage — automatic reconciliation handles new data and saved transformations. Reach for it when:

The lineage graph looks stale or doesn't match a change you know was made
A dataset is showing no lineage yet and you want to populate it
You've just resolved an upstream issue and want to confirm the health status updates immediately
You're troubleshooting and want to be certain you're looking at the latest state

Is it safe to click?

Yes. Refresh Lineage is non-destructive — it doesn't modify your source data, datasets, or transformations. It only re-reads from the source-of-truth services and updates the lineage graph. You can run it as often as you need.

When it runs automatically

You usually don't need to click this button. Lineage refreshes on its own whenever:

Data finishes loading for a dataset
A transformation is saved

When to click it manually

Click Refresh Lineage only if the diagram looks out of date or doesn't reflect a recent change you expect to see. In normal use, this is rarely needed.

Safe to use anytime

Refreshing lineage is non-destructive. It only re-reads existing information to redraw the graph — it does not change, delete, or move any of your data, datasets, or transformations, so it's safe to click whenever you want to verify the view is current.

Note: This button was previously labeled "Reconcile."

📘
Renamed from "Reconcile Lineage"
This action was previously labeled Reconcile Lineage. It has been renamed to Refresh Lineage — both in the More Actions menu on Dataset Details and on the Lineage tab button — to better reflect what the action actually does.