<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1240251502728721&amp;ev=PageView&amp;noscript=1"> Skip to main content

What is Snowplow and who is it for?

The Data Handbook

How to use data to improve your customer journey and get better business outcomes in digital sales. Interviews, use cases, and deep-dives.

Get the book
Author avatar

Elena Juslin



The web analytics field is experiencing quite a turbulence with legislative regulations, Google Analytics' uncertain position in the market and the need for increasingly better business insights.

We at Columbia Road have been going through options for companies planning their holistic data analytics on what tools would fit their needs. In this blog post, we’ll introduce you to one of the more complex options: Snowplow.

What is Snowplow?

Snowplow describes itself as “the world’s largest developer-first engine for collecting behavioural data”. Or a data collection tool, if you will. In essence, it can act as an alternative to common web analytics tools such as Google Analytics and Adobe Analytics. 

Snowplow is highly customisable for each company’s specific needs. However, this makes the setup and management of analytics data much more complex than what many are used to using Google Analytics. That being said, we believe Snowplow to be a more relevant alternative to Google Analytics 360 rather than the basic Google Analytics.

There are two solutions how to manage Snowplow; Snowplow Behavioral Data Platform (BDP) and Snowplow Open Source. The use case and resources affect which solution to choose. 

Snowplow BDP is the easier of the two, as it only requires setting up a cloud, for example, in Google Cloud Platform or AWS. This is where the Snowplow pipeline is set up. You can choose which data warehouse is used in your cloud environment to hold your Snowplow-gathered data. For this there are various options like BigQuery, Redshift and blob storage. Later you can access it from the warehouse in any way you choose because you own it. 

The more tech-savvy option is using Snowplow Open Source, for which you need to set up an Iglu server. This option definitely requires a data engineer for a safe setup.

Hosting Snowplow

Snowplow BDP Cloud pipeline is hosted in a Snowplow-managed account, so you cannot affect where in the world it resides. Currently, it is in AWS eu-central-1 region. Snowplow BDP Enterprise is hosted in your cloud account - but Snowplow's data engineer team sets it up. This option contains the most support and functionality.

With the Snowplow Open Source option, you host and set up the pipeline yourself. This means you have to set up the services that make up the pipeline, in your own cloud account as well as the data warehouse. You also control all of the setups and naturally choose the hosting location. This requires more setup work, while also offering full control.

How Snowplow works

Below is Snowplow’s depiction of their data creation pipeline. First, data is generated and collected as events on different platforms such as your website and apps.

Secondly, it is unified and enhanced before it is sent to your data warehouse of choice. The data warehouse is where the modelling of the data is done. This is where raw data is modelled to the form that is more meaningful for people to use. For example, it allows you to look at the data at a user level instead of an individual event level.

The last part on the right of the image depicts how the data can be used. In this blog, we are focusing on measuring and finding insights from the data, but using Snowplow, you can also use the data to train AI for predictive modelling.



Image source 

Overview of Snowplow setup options

Snowplow has two platforms, Snowplow BDP and Snowplow Open Source, i.e. solutions that are meant for production traffic, and their setups differ a bit. 

Snowplow BDP has two different set-up options: using a cloud of your choice (i.e. AWS or GCP) or using Snowplow’s cloud. If you have your own cloud, the web analytics set-up only consists of adding the basic JavaScript tracker to your website and setting up a project in GCP or AWS. Then you can query that data with, for instance, BigQuery. If you want visualisations, you can use a connector with Looker studio, Tableau or another visualisation tool.

The Snowplow Open Source setup is more complex. That you can also set up using GCP or AWS. Additionally to the same steps as for BDP, you’ll have to set up an Iglu server and a pipeline. The pipeline setup is offered in the format of a Terraform template, and when the template is filled out with some of your own variables, it can be used to deploy several services to your cloud account, forming the pipeline. The services include Snowplow apps, load balancers, pub/sub-topics, databases and a data warehouse or lake.

The Pros and cons of Snowplow

Beginning with Snowplow’s strengths, it’s highly customisable and equipped to give you exactly the data you need. It integrates well with Snowflake in case you’re using it already as a data cloud for other purposes such as your product recommendation algorithm. It can also nicely connect online and offline purchase data. Even though Snowplow BDP is not free like GA, it’s more affordable than its true enterprise-tier competitors such as Google Analytics 360 and Adobe Analytics. Snowplow Open source is free to use, but requires more developer resources than GA360 and Adobe. Moreover, Snowplow Open Source data and data models are owned by you and there’s no vendor lock-in.

There are cons, however. As they say themselves, Snowplow is a developer-first tool. Even Snowplow BDP Cloud requires quite a tech-savvy setup. And that’s just the beginning. In order to get any data visible, you need to be able to write efficient SQL queries just to get the data in raw format to a table and eventually, into pretty graphs and visualisations that make sense. Snowplow BDP Cloud is also missing some features, such as, custom events, so it doesn’t allow for a fully customisable set-up.

Who is Snowplow for?

Snowplow Open Source is certainly an enterprise solution. A team of data engineers and analysts is more or less a necessity in building the setup and at least one data engineer is needed for maintaining it. There are no out-of-the-box visualisations, so merely setting up the basics requires much more resources than with for example Google Analytics or Matomo which automatically give you graphs and tables as soon as you place the script on the site.

Snowplow BDP might work as an alternative for the basic GA4 even without a team of data engineers. However, Snowplow BDP Cloud does have a 10 M/month event limit which Snowplow BDP Enterprise does not have. Even with Snowplow BDP (as opposed to Snowplow Open Source), we would have a data engineer sitting close by to make things progress smoothly.

This tool suits those who need full control over the data pipeline and for those who need the ability to customise their analytics setup. If you’re considering GA360 or Adobe analytics, Snowplow could be a great alternative. If you’re looking for something similar to the free GA, Snowplow probably won’t be your first option. Whether you are setting up Snowplow Open Source or BDP, your ideal team would contain both data engineers and martech experts.

How to leverage Snowplow as more than a web analytics tool?

Snowplow shows its true power with advanced use cases that go beyond general web analytics. Advanced use cases are for instance: combining web analytics data with other company data sources, analyzing product performance in a webstore listing tens of thousands of articles or more, predictive analytics that could impact, for instance, resource planning, in-depth customer behaviour analysis for conversion optimization, and web store customer segmentation to name a few. 

If you’re using several tools to handle these kinds of use cases, Snowplow could be a great tool to combine them into one. These use cases are usually more common in large companies, and they are the reason why we consider this tool to be most suitable for enterprises whose options are currently GA360 or Adobe Analytics.

We at Columbia Road are happy to help you choose your next web analytics tool and build your analytics capabilities and reports related to it. 


Read more

The Data Handbook

How to use data to improve your customer journey and get better business outcomes in digital sales. Interviews, use cases, and deep-dives.

Get the book