Databases
The Challenge: Databases Change
SELECT * FROM customer_data WHERE signup_date >= '2024-01-01'
-- Returns: 50,000 rowsSELECT * FROM customer_data WHERE signup_date >= '2024-01-01'
-- Returns: 65,000 rows (15,000 new customers!)The Solution: Query + Snapshot Pattern
Authentication: Machine Identity vs. Credentials
Decision Tree
Machine Identity (Recommended)
Stored Credentials
Complete Workflow Example
Step 1: Query Database and Save Snapshot
Step 2: Create Dataset Version with Snapshot
Step 3: Train Using Versioned Dataset
Step 4: Update Snapshot (Monthly/Weekly)
Why Snapshots Matter: Real Example
Best Practices
Snapshot Every Query
Version Your Snapshots
Tag Snapshots with Metadata
Schedule Regular Snapshots
Separate Query from Training
Environment Variables for Credentials
Project-Level Variables
Name
Value
Secret
Organization-Level Variable Groups
Common Patterns
Daily Snapshot for Real-Time Features
Incremental Snapshots
Train/Validation/Test Splits from Database
Database-Specific Guides
Related Pages
Next Steps
Last updated
Was this helpful?
