Why labels are essential for managing BigQuery at scale

BigQuery is a powerful data warehouse that can help you analyze large datasets quickly and efficiently. However, with so much data at your fingertips, it can be difficult to keep track of all your queries and ensure that they are performing as expected. This is where labels come in.

Labels are key-value pairs that you can assign to BigQuery resources, such as datasets, tables, and queries. Labels can be used to organize your resources, track costs, and monitor performance.

Add a label to your query

Add a label to a query is quite easy, here an example in NodeJS.

const {BigQuery} = require('@google-cloud/bigquery')
const bigquery = new BigQuery()

async function runQuery(query) {

    const [job] = await bigquery.createQueryJob( {
        query: query,
        location: 'us',
        dryRun: false,
        labels: {
            "author": "me",
        },
    })
    
    const [rows] = await job.getQueryResults()
    return rows.map(row => row)
}

runQuery('SELECT "hello world"').then(console.log)

The label strategy

Now that you know how to add a label to a query, you can use them to group your queries by page, feature, or even business unit. For example, Airflow uses the following identifiers: dag_id, run_id, or task_id. This way, you can know the cost or total duration of a specific task in your service.

Use a label as a new dimension

BigQuery allows you to easily add any label as a dimension, which will allow you to break down any analysis and also filter by a value of that label. Biq Blue allows you to display it in a second.

Conclusion

And that's it, now you can easily associate your BigQuery costs with your business, teams and features.

⭐ Additional tips