In the ever-evolving landscape of machine learning and AI development, separating your development and production environments is no longer just a best practice—it’s a necessity.
There are many reasons why you’d want to split your environments, from ensuring the integrity of your production data to maintaining access control and resource segregation.
Why Separate Dev and Prod?
Maintain Separate Environments
Separating development and production environments is crucial for several reasons, including cost tracking, access control, and resource isolation. By doing so, you:
- Safeguard your production environment from unintended changes during development.
- Enable tracking of development costs versus production expenses.
Access Control and Data Segregation
Access control is a top priority in machine learning projects. Separating dev and prod environments helps prevent unauthorized access and data breaches, ensuring that:
- Production pipelines are not accidentally launched with development data and vice versa.
- Only authorized team members can promote data, models, and code to production, as well as launch and schedule production pipelines.
- Dev and test environments remain isolated, with no visibility into each other’s data or results, essential for data integrity and security.
Different Resources for Different Needs
Each environment, whether for development or production, may have distinct resource requirements, including machine types, networking configurations, and access rights. For example, in the production environment, you might:
- Require workloads to use approved Docker images and undergo thorough vulnerability scanning.
- Implement a separate virtual network with access to production data, inaccessible from development environments.
These measures ensure the robustness and security of your production pipeline.
What to Consider?
Before implementing the separation of your development and production environments in Valohai, consider the following checklist. Not all sections may apply to your use case, but it’s recommended to review each point:
Accounts and Resource Groups
- Create separate accounts, subscriptions, or resource groups for development and production.
- Restrict access to approved base Docker images.
- Ensure accurate and segregated cost tracking.
- Use different clusters for deployment, potentially within distinct namespaces or entirely separate clusters.
User Access
- Define a distinct set of users for different environments.
- Implement access controls to enforce these restrictions.
Data Storage
- Consider separate data stores for different stages.
- Define some resources as read-only in one environment and read-write in another.
- Decide whether to promote data and models between stages or generate new datasets and models in the new environment.
- Determine if you need to share Valohai datasets and aliases between environments.
Version Control
- Decide which branches to pull from in different stages.
- Limit production projects to pulling from the main branch.
- Implement Git branch protection rules to ensure code changes are reviewed and approved.
API Key Management
- Manage API keys separately for production and development.
- Consider implementing key rotation policies in different environments.
Quota Management
- Evaluate quota management if your environments reside in the same account.