dlt Secrets Management: 3 Approaches for Secure Data Pipelines

From the simple secrets.toml files to env variables and secrets-manager integrations, this post compares their benefits and trade-offs

May 04, 2025

1. Intro

TL;DR

Mismanaging credentials in your dlt pipelines can lead to data leaks, compliance headaches, and painful key rotations. In this post, you’ll get three battle-tested patterns to keep your secrets safe.

When you and me started writing data pipelines we had no idea how to handle security credentials and, even now, I am writing this in the hope that an internet sage will identify my errors and offer a few words of enlightenment.

In my first data projects, everything was happening inside a database. Not only the data, but also the code and the scheduler. There was no need to pass any secret.

Then I moved on to projects where the code and scheduler lived outside the database, and suddenly credentials had to be passed everywhere. A senior developer advised me: "Never store credentials in the code." This was the only thing I knew. And with that, I started my journey into secrets management.

Purpose of this article

In this article, I’ll walk through three proven patterns for handling secrets in dlt projects, from simple file-based stores to full–blown enterprise vault integrations; so you can choose the right approach for your team’s size, risk profile, and operational maturity.

What You’ll Learn

File-based secrets with a local, Git-ignored secrets.toml
Runtime overrides using dlt SOURCES__… or DESTINATION__… environment variables
External vault integrations (for example, Vault or AWS Secrets Manager)

Before we dive into implementation details, let’s first ask: why does secrets management really matter? In the next section, we’ll explore the top risks and operational benefits that a solid strategy unlocks.

2. Why Secrets Management Matters

Security & Compliance

Hard-coding credentials in our pipeline code or configuration files is a ticking time bomb. A leaked key can expose sensitive data, open doors for unauthorized access, and trigger costly breach investigations. On top of that, regulations like GDPR and SOC 2 demand proof that we’re handling secrets responsibly. If we can’t demonstrate strict controls over who accessed which credentials and when they were rotated, we may face fines or audit failures.

Operational Flexibility

Imagine rebuilding and redeploying every pipeline whenever a database password changes. That’s a recipe for downtime and frustrated teams. With a robust secrets strategy, we can swap out credentials (dev, test, and production) without touching pipeline code. This separation of code and configuration lets us automate rotations and spin up new environments in minutes rather than days.

Auditing & Traceability

When every secret lives in a centralized store, we gain a full audit trail: who fetched each credential, which service used it, and when it was rotated. This level of visibility is invaluable during security reviews or incident investigations. It also encourages least-privilege access: services only receive the secrets they need, and we can revoke access instantly if something goes wrong.

3. Pattern 1: File-Based Secrets (`secrets.toml`)

When we start writing a dlt pipeline, the simplest approach is a local secrets.toml file. Here’s how it works:

Location
Store secrets.toml in the .dlt directory at the root of your project, alongside config.toml.
Structure
Define credentials under each destination (or source) section. For Postgres, for example, we’d use:

[destination.postgres.credentials]
database         = "dlt_data"
username         = "loader"
password         = "<password>"  # replace with your password
host             = "localhost"   # or the IP address of your database
...

Loading Behavior
dlt automatically picks up secrets.toml at runtime and merges its values into our pipeline configuration.

Note: This pattern works great for local development and quick prototypes, but it’s less suitable for shared or production environments where auditability, secure distribution, and automated rotation are critical.

Pros and Cons

Pros

Simplicity: no external systems to configure, just a file we git-ignore
Beginner friendly: ship a secrets.toml.template so new team members know exactly what to fill in
Quick iterations: ideal for proofs of concept and local development

Cons

Limited auditability: edits to secrets.toml aren’t tracked unless we add extra logging
Distribution risk: team members must share the file securely (or create their own copy)
Rotation friction: rotating credentials means editing the file, maybe on server

Best Practices

Add secrets.toml to .gitignore immediately.
Keep an encrypted backup (for example, in a team password manager) for onboarding new members.
Secure file permissions (for example, chmod 600 .dlt/secrets.toml) so only the dlt process can read it.

This file-based approach gets us up and running fast, but, as our projects scale, we’ll likely move on to more dynamic methods. Next, we’ll look at how to override credentials at runtime without touching the secrets file directly.

4. Pattern 2: Environment Variables

When we need to inject credentials at runtime, without touching any files, we can leverage environment variables. dlt will pick up any credential defined via an ENV var, and these values always take precedence over secrets.toml or built-in defaults.

Variable Naming
Uppercase section and key names, with double underscores replacing TOML delimiters:

<SECTION>__<SUBSECTION>__…__<KEY>

Or for our Postgres example:

export DESTINATION__POSTGRES__CREDENTIALS__USERNAME="ci_loader"
export DESTINATION__POSTGRES__CREDENTIALS__PASSWORD="$CI_DB_PASSWORD"
export DESTINATION__POSTGRES__CREDENTIALS__HOST="db.example.com"
export DESTINATION__POSTGRES__CREDENTIALS__DATABASE="dlt_data"
export DESTINATION__POSTGRES__CREDENTIALS__PORT="5432"

Loading Behavior
At pipeline start, dlt scans all ENV vars matching its credential naming scheme and merges them into your configuration, overriding any values from secrets.toml or config.toml.

Note: This pattern is ideal for CI/CD or containerized runs, but for long-lived production environments it still relies on secure injection (see Best Practices below).

Pros and Cons

Pros

No files required: credentials never touch the filesystem or repo
Dynamic overrides: perfect for CI jobs, one-off runs, or debugging
Immediate effect: changes take hold on next run without file edits

Cons

Injection risk: ENV vars can leak in process listings or CI logs if not handled carefully
Lack of audit trail: you won’t see in Git or in a vault who changed a value and when
Rotation friction: rotating a secret still requires updating the injector (e.g., CI variable or deploy script) and re-run the CI/CD pipeline.

Best Practices

Runtime-only injection: set variables in your CI/CD platform or orchestrator, never bake them into images or commit them to config files.
Use orchestrator secrets: leverage Kubernetes Secrets, GitHub Actions Secrets, or GitLab CI variables for secure runtime injection.
Limit exposure: avoid printing these ENV vars in logs; use built-in masking features of your CI system.

Next up: the most robust pattern, integrating with an external secrets manager for full auditability, dynamic rotation, and enterprise-grade security.

5. Pattern 3: External Secrets Managers

As our pipelines scale and compliance becomes critical, using external secrets managers provides centralized, dynamic, and secure credential management.

Why Use a Secrets Manager?

Enterprise-grade security: Centralized credential management reduces leakage risk.
Automatic rotation: Simplifies regular updates of credentials.
Auditability: Provides clear visibility into who accessed credentials and when.

Common integrations are:

AWS Secrets Manager, AWS SSM Parameter Store
Azure Key Vault
GCP Secret Manager

When integrating secrets managers with dlt pipelines, we typically consider two main approaches:

Method 1: Fetch secrets at runtime and inject them as environment variables.
Method 2: At pipeline startup, use the Python SDK of your cloud provider to directly retrieve and inject credentials.

Let's briefly explore both options.

Method 1: Environment Variable Injection

In this approach, a small script retrieves credentials from our external secrets manager just before running the pipeline, setting them as environment variables.

Example (pseudo-code):

credentials=$(secrets-manager get-secret --name "prod/db_credentials")

export DESTINATION__POSTGRES__CREDENTIALS__USERNAME=$(echo $credentials | jq -r '.username')
export DESTINATION__POSTGRES__CREDENTIALS__PASSWORD=$(echo $credentials | jq -r '.password')

dlt run my_pipeline

Pros and Cons

Pros

Simplicity: Quick and straightforward to integrate with orchestrators and CI/CD platforms.
Separation of concerns: Pipeline logic remains clean and unchanged.
Dynamic: Easily updated without modifying pipeline code or files.

Cons

Leakage risk: Environment variables might leak into logs or system outputs if not handled properly.
Limited auditability: Less direct tracking of credential usage at runtime.

For additional considerations, see also the pros and cons discussed in Section 4.

Method 2: Direct Python Integration

In this method, our pipeline script directly retrieves credentials from the secrets manager using the cloud platform's IAM role or managed identity. Credentials are passed straight into dlt without involving environment variables.

Example (Python pseudo-code snippet):

import dlt
from my_secret_manager_client import get_credentials

credentials = get_credentials(secret_name="prod/db_credentials")

# Inject credentials directly into DLT
dlt.secrets["destination.postgres.credentials"] = {
    "username": credentials["username"],
    "password": credentials["password"],
    "database": "dlt_data",
    "host": "db.example.com",
    "port": 5432
}

# Continue with pipeline execution

Loading Behavior
Credentials are injected directly into dlt’s runtime memory configuration, eliminating external exposure.

dlt comes already with integration with the most common secrets managers, for additional reading check this example for the GCP Secrets Manager.

Pros and Cons

Pros

Highest security: Credentials remain confined to pipeline memory at runtime.
Leak-proof: Eliminates risks associated with environment variables.
Audit-friendly: Clearly identifies each pipeline's credential access via IAM roles.

Cons

Additional complexity: Pipeline code includes direct integration with secret management SDKs.
Dependency management: Additional SDKs or libraries might be needed in the pipeline.

Best Practices (Applicable to Both Methods)

Least privilege IAM roles: Grant pipelines minimal permissions for retrieving secrets.
Automated rotation: Regularly rotate credentials automatically via your secrets manager.
Careful logging: Ensure sensitive credentials never appear in logs or pipeline outputs.

Recommendations

While Method 1 (environment variables) provides a quick setup without changing pipeline logic, Method 2 (direct Python integration) offers the most secure, robust, and leak-proof approach. For production-grade pipelines and highly regulated workloads, the Python integration is my recommended choice.

6. Comparative Analysis & Decision Guide

As we’ve seen, each secrets-management pattern offers distinct trade-offs. The table below summarizes key dimensions:

When to choose each pattern

File-based (secrets.toml)
Ideal for early-stage projects, quick proofs of concept, or small teams where simplicity and speed outweigh the need for rotation or audit trails.
ENV-variable overrides
A practical solution for teams already comfortable with CI/CD processes and pipeline deployments but who haven’t invested time in using a secrets manager. It works well initially, yet it can leave us vulnerable to credential leaks and audit failures.
External secrets manager
The recommended choice for production systems, regulated industries, and multi-team environments. When strong auditability, automatic rotation, and centralized policy enforcement are non-negotiable, this pattern delivers the highest guarantees.

By aligning our project requirements, simplicity versus security, speed versus compliance, we can select the pattern that fits best or even combine approaches as our scale and needs evolve.

7. Conclusion & Next Steps

It took me longer than expected, but we have explored three progressively more robust patterns for dlt secrets management.

At this point what remains is for us to review our current pipelines and ask ourselves:

Are we rotating credentials automatically?
Can we trace every secret access?
Do we adhere to least-privilege principles?

I know, I am reading the questions I just wrote and I hate myself.

Beside writing posts on this select * from substack, I also work as data consultant for my own company. It was an excuse to work with people that I like and get some more interesting projects. If you have a data challenge, if your delivery speed is too slow, or you if think you can do more with data. Feel free to contact us.

select * from

Discussion about this post

Ready for more?

select * from

dlt Secrets Management: 3 Approaches for Secure Data Pipelines

From the simple secrets.toml files to env variables and secrets-manager integrations, this post compares their benefits and trade-offs

1. Intro

TL;DR

Purpose of this article

What You’ll Learn

2. Why Secrets Management Matters

Security & Compliance

Operational Flexibility

Auditing & Traceability

3. Pattern 1: File-Based Secrets (secrets.toml)

Pros and Cons

Best Practices

4. Pattern 2: Environment Variables

Pros and Cons

Best Practices

5. Pattern 3: External Secrets Managers

Why Use a Secrets Manager?

Method 1: Environment Variable Injection

Pros and Cons

Method 2: Direct Python Integration

Pros and Cons

Best Practices (Applicable to Both Methods)

Recommendations

6. Comparative Analysis & Decision Guide

When to choose each pattern

7. Conclusion & Next Steps

Discussion about this post

Ready for more?

3. Pattern 1: File-Based Secrets (`secrets.toml`)