Before you begin, make sure you have an existing BigQuery database with an inbound rule allowing connections from the Valohai workers’ security group.
Service Account
- Go to your GCP Project hosting Valohai resources.
- Navigate to IAM -> Service Accounts.
- Create a new service account.
- Grant BigQuery User permissions to the service account.
- Share the service account’s email with Valohai, specifying the environments to attach it to. Different Valohai environments can use different service accounts.
Accessing BigQuery from Another GCP Project
- If your BigQuery data is in a different GCP Project, grant the newly created service account BigQuery User permissions there.
- In the project with only Valohai resources (not the actual BigQuery data), the service account doesn’t need BigQuery User or BigQuery Data Viewer permissions.
Example
Use the Python Client for Google BigQuery to establish a connection to BigQuery in your code.
When launching Valohai executions, choose an environment with the attached service account for automatic authentication using the machine’s service account credentials.
from google.cloud import bigquery
bqclient = bigquery.Client(project='myproject')
# Download query results.
query_string = """
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""
df = (
bqclient.query(query=query_string)
.result()
.to_dataframe()
)
print(df.head())
df.to_csv("/valohai/outputs/dump.csv")