How to wisely choose the storage billing model on Bigquery?
BigQuery offers two storage billing models: logical bytes and physical bytes. The logical bytes model is the default and charges you based on the size of your data after it has been compressed. The physical bytes model charges you based on the actual physical storage space used by your data.
By switching certain datasets to physical, we've divided our storage costs by 2
Datasets dedicated to physical storage?
The storage billing model is applicable at the dataset level, but the physical model is only of interest when the data type is well compressed. Consequently, when you create a new table, it's best to place it in a dataset containing tables with the same level of compression.
Which dataset?
Here the trick, there is no simple way to get a clear overview of what cost all your datasets in all your organization with the GCP console. You can do it yourself using INFORMATION_SCHEMA.TABLE_STORAGE
but you have to do it for all your projects. A much simple way is to use Biq Blue, in just one click you get the list of datasets that should use the physical
storage billing model.
Change the storage billing model
It is that simple (careful read the official documentation first):
ALTER SCHEMA `PROJECT.DATASET` SET OPTIONS(storage_billing_model = "physical");
You should see the cost effect in the GCP billing console in 2 days
Conclusion
If your data is properly compressed, this trick can really reduce your storage costs on Bigquery.
⭐ Additional tips