Dataset Lineage

Dataset Lineage

Lineage Graphs provide a visual representation of dataset relationships across NinjaCat. They show how datasets, transformations, and views are connected — making it easier to trace dependencies, investigate issues, and assess downstream impact before making changes.

Why use Lineage Graphs

Lineage Graphs improve visibility into data dependencies and reduce the time required to troubleshoot broken or unexpected dataset behavior. Instead of manually tracing relationships across multiple assets, you can review a single visual graph showing what feeds into a dataset and what depends on it.

Common workflows where lineage helps:

  • Troubleshooting broken datasets
  • Understanding how source changes affect downstream assets
  • Identifying dependencies before deleting or modifying a dataset
  • Building confidence in data quality and trust

See lineage at a glance

Lineage information is surfaced in several places throughout the platform so you can monitor dependency health without leaving your current workflow:

  • Lineage tab on every dataset — Open the Lineage tab on a dataset to see an interactive upstream/downstream graph of every dataset it depends on and every dataset that depends on it.
  • Lineage health column — Dataset and connector lists include a lineage health column so you can spot issues across your data at a glance.
  • Unhealthy source warnings on transformation pages — When you build a transformation, NinjaCat flags any unhealthy source datasets directly where you're working.

Errors and warnings on the Dataset Details page

Errors and warnings for a dataset are surfaced directly at the top of the Dataset Details page — above the Sync Details, Data Checks, and Lineage tabs — so you can see the dataset's current health without opening the lineage graph.

⚠️

A warning is not an error

Warnings are informational. They flag conditions you may want to be aware of — most commonly that a dataset is not account matched or not date matched. A warning does not mean the dataset is broken, and it does not always require action.

If account matching or date matching isn't relevant to how you're using the dataset, you can safely ignore the warning. For example, if your workflow doesn't depend on account-level joins, a "not account matched" warning is just a notice — no fix needed.

Errors, by contrast, indicate a broken or invalid dependency and typically need to be resolved before the dataset can be used reliably downstream.

Warnings and errors reflect the state of your data as of the last time it was received — not your current configuration. If you've recently updated a mapping or setting, trigger a sync to refresh the messages.

The same conditions can also be inspected per-node in the lineage graph — see Inspect errors and warnings below.

Working with the lineage graph

Dependency visualization

The lineage graph displays relationships between datasets, transformations, and views. For any selected dataset, you can see both upstream source dependencies and downstream usages in a single interface.

Filter by sources or usages

You can filter the graph to focus on:

  • Sources — assets that feed into the selected dataset
  • Usages — assets that depend on the selected dataset

This lets you narrow the view based on the troubleshooting or analysis task at hand.

Inspect errors and warnings

Each node in the graph can expose validation details, including warnings and errors. You can inspect problematic nodes directly from the graph to identify issues affecting dataset integrity or downstream outputs.

Warnings vs. errors
Warnings are informational and may highlight potential issues such as account matching or date column concerns. A warning does not necessarily mean a dataset is unusable. Errors indicate broken or invalid dependencies that likely need attention.

Filter to errors only

In more complex graphs with many connected assets, you can isolate elements with errors to focus your investigation on broken or problematic dependencies.

Navigate to related assets

From the lineage graph, you can navigate directly to related datasets or views for additional review and remediation.

How lineage stays up to date

Lineage updates are event-driven in many common scenarios. For example, lineage refreshes automatically when:

  • New data enters a dataset
  • A transformation is saved and rebuilt

Some edge cases may not refresh immediately, so manual reconciliation may be needed to retrieve the latest state.

Reconcile Lineage

When something looks off — or when a dataset has no lineage yet, or after you've resolved an issue and want to confirm the latest dependency state — use Reconcile Lineage to refresh it.

You can trigger Reconcile Lineage from:

  • The dataset detail page
  • The dataset list row menu
  • A connector's datasets table

Reconciling re-walks the dataset's full upstream and downstream dependency chain, revalidates every node, and refreshes the health status everywhere lineage is shown.