Why Your Customer Data Platform (CDP) Should Be the Data Warehouse.
Off-the-shelf Customer Data Platforms have serious shortcomings. Consider a composable Customer Data Platform instead.
Tejas Manohar
June 8, 2022
16 minutes
Thereâs been a lot of buzz around Customer Data Platforms (or CDPs) lately. Every vendor in Martech is trying to sell you on buying their SaaS to build a âsingle view of the customerâ. If youâre not familiar with Customer Data Platforms, here's:
A customer data platform (CDP) is a collection of software that creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned and combined to create a single customer profile. This structured data is then made available to other marketing systems.
Sounds useful, right? It is! If youâve ever had to deal with customer data at a company, regardless of size or your business department (sales, marketing, support, engineering, analytics, etc.), the value proposition of a CDP is clear. Building and operationalizing a single view of the customer is hard. In fact, itâs only been getting harder.
As a (former) early employee at Segment and someone who has been twiddling the lines between marketing and data communities for over five years, I find Customer Data Platforms nothing short of intriguing.
In the marketing community, an all-in-one platform to solve our countless data problems sounds like the holy grail. In the data community, on the other hand, weâve been trying to do this all along using the data warehouse. Yet, most people donât make the connection between the two.
Before I make the case for why the data warehouse (and not an off-the-shelf CDP) should be your Customer Data Platform, Iâd like to provide some context on the CDP market and its key players.
What Is a Customer Data Platform?
CDPs are all-in-one marketing and data platforms. They aim to serve as a database for all your customer information with a bundled activation layer to help you leverage the data for marketing automation.
All CDPs have a few common components:
- Data ingestion. Since CDPs are databases of customer data, they need a way to ingest data. Most CDPs achieve this via an API for developers to track traits about users and events that theyâre taking across your applications.
- Identity Resolution. CDPs build and maintain graphs of user profiles so that all user identifiers (cookie, IDFA, device ID, etc.) can be mapped back to a âsingle user IDâ. Most CDPs implement a simple deterministic algorithm for identity resolution. These algorithms are functionally similar to the queries that your analyst team has already written to do marketing attribution in SQL, e.g. joining across âanonymousâ and âknownâ user profiles using a handful of identifiers. Some CDPs, however, do identity resolution probabilistically, which is not straightforward to do in-house without a skilled data science team.
- Audience builder. This is perhaps the most necessary component of a CDP. Without an audience builder, a CDP is just âCustomer Data Infrastructureâ. The audience builder is an interface for marketers to create customer segments without SQL and sync them to various marketing and advertising platforms to run targeted campaigns.
Outside of these core components, some CDPs have additional features for marketers, like cross-channel orchestration, predictive audiences, etc. Check out this guide that offers a complete overview of the available data integration technologies, including Customer Data Platforms.
Are Customer Data Platforms a Big Deal?
CDPs arenât just some new kids on the block and should be taken very seriously. As seen below, the industry and term has been consistently growing over the last 5 years.
Source: https://explodingtopics.com/topic/customer-data-platformBuyer demand follows this trend too. The overall combined revenue of players in the CDP industry is north of $2 billion as of 2019.
Types of Customer Data Platforms
The CDP players can be broken down into a few major categories
General purpose Customer Data Platforms
General purpose CDPs target a broad range of use cases. The leading players in the general CDP space today today are Segment and mParticle. Then, thereâs a number of runner-ups like Treasure Data, Simon Data, Lytics, Blueshift, and Redpoint Global.
Vertical Customer Data Platforms
These are CDPs that target a very specific type of company (industry, size, maturity, SaaS tools in place, etc.) and solve specific problems.
Conglomerate Customer Data Platforms
These are big companies like Adobe, Salesforce, and Microsoft that have started calling their existing CRMs or âmarketing cloudsâ CDPs.
Thereâs too many CDP vendors to count these days. This makes the space rather difficult to maneuver as a newcomer. In the rest of this article, weâll focus on the general-purpose CDPs like Segment Personas, mParticle, and Treasure Data. Theyâre what most people think of when they hear CDP and they have far more companies using them by sheer count than the rest. As a (former) early employee at Segment, thatâs the space that I have the most experience with.
Wait, why should my Customer Data Platform be the data warehouse?
There are 5 key reasons why you should prefer your data warehouse to an off-the-shelf CDP.
- CDPs are not the single source of truth. The data warehouse has all your data.
- CDPs do not mesh with data teams. Marketing and data teams should work together.
- CDPs are not flexible. Every business has a unique data model.
- CDPs own your data. Youâre locked in.
- CDPs do not benefit from the data ecosystem. Youâre siloed.
Customer Data Platforms are not the single source of truth
The data warehouse has all your data. Whether youâre a D2C brand, B2B SaaS company, e-commerce marketplace, or even a massive bank like Capital One, chances are your customer data is already in a data warehouse. The number one reason that your CDP should be the data warehouse is that your data warehouse is already your CDP.
CDPs claim to be the single source of truth, but CDPs do not replace data warehouses. Thereâs nothing about having separate databases of customer information for different departments that spells âsingle source of truthâ. Some CDPs support importing data from the data warehouse but doing so results in additional data latency and âdata freshnessâ remains an unfulfilled promise of a customer data platform.
Itâs easier than ever to centralize all your data in a warehouse using SaaS platforms like Fivetran. Once your customer data is in the warehouse, data teams define core definitions, like âwhat is an âactiveâ userâ, in SQL. The data warehouse is the source of truth for your businessâ trusted definitions. It doesnât make sense to redefine these definitions in your sales, marketing, etc. tools -- they should originate from your data warehouse.
Most companies do not think of their data warehouse as being a platform for more than analytics, but companies with modern data stacks have been building operational data pipelines off of their warehouse for years. SaaS solutions like Hightouch let you easily push data and definitions to business tools from your warehouse with just SQL, no scripts.
Customer Data Platforms Do Not Mesh With Data Teams
CDPs target marketing teams and primarily sell to CMOs. Ultimately, marketers are not the right persona to solve the intricate data problems that CDPs address.
Self-service access and data democratization is important, but itâs a cross-functional effort. Data teams should be responsible for understanding your companyâs data model and building clean data models for everyone else to consume. Marketing teams should be empowered to analyze customer behavior and iterate on customer segments for campaigns without being bottlenecked by data teams.
CDPs do not recognize this and instead, they give marketers immense capabilities without the process or guard rails of data and engineering teams. There can be a happy, productive balance between marketing and data teams, but the tools and processes your company adopts must understand the role and workflow of each team and facilitate collaboration between them. This is the whole thesis behind Hightouch Audiences -- it allows marketers to visually segment users based on data models that your data team assembles in SQL within your data warehouse.
Ultimately, there arenât engineering or marketing concerns -- there are business concerns. Marketing alone often does not have the technical capacity to evaluate deeply technical concerns, but they do have the business leverage to be severely affected by them. Effective collaboration between teams is the foundation of a successful company.
Customer Data Platforms Are Not Flexible
CDPs are built around rigid data models. Segment Personas, as an example, offers only two core objects -- users and accounts. Whatâs more? A user can only belong to a single account.
In reality, data models arenât so cookie-cutter. Users can be in multiple accounts and accounts can have sub-accounts, business units, etc. Apart from users and accounts, companies of the 21st century have their own proprietary objects and hierarchy.
- B2B companies like GitHub have organizations, repositories, issues, pull requests, etc. And, thatâs just in their app without considering Salesforce/CRM, Zendesk/support tools, etc.
- B2C companies like Amazon have users, carts, subscriptions (Prime, Audible, etc.), sellers, orders, returns, gift cards, search history, and global product inventory. The list goes on.
The CDP ecosystemâs response to custom data is âeventsâ. CDPs allow you to send them a stream of custom events performed by your users. This sounds great in theory, but itâs not always easy to answer the questions you need with just events. Data warehouses have what CDPs lack -- the ability to model and query arbitrary relational data.
When it comes to the limitations of the data models in CDPs, I think back to my time as an engineer at Segment building the Personas product. We were unable to effectively âdogfoodâ our own product due to shortcomings in the data model, like not being able to handle users in multiple workspaces (accounts). As a result, I would frequently have to write SQL against our data warehouse to query the state of a user or account at Segment.
Customer Data Platforms Own Your Data
CDPs offer restricted access to your customer data, whereas data warehouses offer unrestricted access to your data. The best companies recognize that their ability to leverage customer data is a competitive advantage. Therefore, they should own their data.
CDPs only expose very specific actions on top of your customer data, generally purpose-built for marketing workflows. Since CDPs are all-in-one solutions, youâre locked in and subject to the whims of your CDP vendor in terms of how you can use your customer data. Thereâs no such thing as a smooth transition from one CDP to another. With the advent of the cloud, thereâs no reason that your companyâs business workflows should be tied to a vendorâs data plane.
Source: https://twitter.com/sperand_io/status/1346251884739362816
And this is just from a functionality perspective. With the rise of regulation and concerns around data privacy (GDPR, CCPA, etc.), data residency (e.g. invalidation of Privacy Shield), and data security (SOC2, ISO, HIPAA, etc.), there is no truly on-premise CDP offering.
CDPs Do Not Benefit the Data Ecosystem
Since CDPs own your data, they own your ecosystem. Each CDP has to build its own independent âecosystemâ.
Because CDPs are built to play well with their proprietary ecosystem, every CDP has to independently address these concerns via proprietary product features. As an example, if you send a bunch of bad events to a CDP, youâre limited to the features they have available to clean your data set. The transformations you need to run often donât exist so you have no choice but to file a support ticket. However, if your CDP is your data warehouse, you can use SQL to transform your data in any way you wish and tools like dbt on top to systematically encode and execute these transformations.
There are a number of concerns after data collection -- data QA, metadata/discovery, monitoring, observability, lineage, etc. No single vendor, not even a software giant like Salesforce or Adobe, is poised to build best-in-class software that addresses each of these concerns. In most cases, CDPs do not address all of these data concerns effectively, as theyâre focused on building features that appeal to marketers. Even in a perfect world where a CDP does address all of these concerns, you would have to use a separate set of tools to solve these concerns again for the data warehouse since CDPs do not replace data warehouses.
Source: https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure/
On the other hand, the ecosystem around data warehouses is growing rapidly. Data warehouses are the standard that every vendor in SaaS is thinking about. Companies attacking these problems in a warehouse-first way are emerging left and right.
When Does an Off-The-Shelf Cdp Make More Sense?
It would not be fair to CDPs if we didnât talk about when it does make sense to choose them. Despite not believing CDPs are the âbe all, end allâ to customer data, there are cases where it does make sense to consider a CDP.
Vertical Customer Data Platforms
Vertical CDPs are CDPs built for a specific type of company, categorized by industry, size, purpose, etc. Contrary to general-purpose CDPs, Iâm actually very bullish on vertical CDPs.
My two favorite examples of vertical CDPs are Amperity & Zaius.
- Amperity focuses on hard data science problems for traditional retail companies, like making a best guess of what a household is from disparate data sources.
- Zaius focuses on building off-the-shelf integrations for the mid-market e-commerce company using Shopify or Magento, supplemented by common SaaS services.
Companies using vertical CDPs still have data warehouses for analytics at a minimum. In fact, a number of large enterprises just use Amperity for the identity piece and build their own pipelines from the data warehouse to other tools for sales/marketing/support.
No-Code Capabilities of Customer Data Platforms
CDPs do give marketers new abilities. The average marketer isnât suited to solve problems like identity resolution unless theyâre well-versed in SQL. For the aforementioned reasons, weâd argue that this is okay, and that marketing and data teams just need a framework to collaborate.
That said, if you do not have access to someone with SQL skills to model your companyâs data, it might make sense to settle for an off-the-shelf CDP.
Customer Data Platforms Do Not Require a Modern Data Stack
If your company has a subpar data stack and you donât intend to improve it soon but youâre severely bottlenecked on the marketing side, then it may make sense to use an off-the-shelf CDP as a stopgap. The warehouse-based approach is only as good as the data warehouse itself.
If human resources are the problem, weâd urge you to address that directly. Many companies fail to implement CDPs. Half of the challenge of adopting any software is human. Itâs very difficult to build a database of all your customer information without having someone on your team that can navigate the intricacies of your companyâs data. Services like Snowflake, Fivetran, and Hightouch have made building a modern data stack a breeze.
Real-Time Capabilities of Customer Data Platforms
Some CDPs do have real-time capabilities that are somewhere between âchallengingâ and âimpossibleâ to achieve with data warehouses alone today.
In most business cases, true real-time capabilities frankly arenât necessary or helpful. That being said, there are certain use cases, where executing operations in near real-time is valuable. For example, a transactional notification like âThanks for making a purchaseâ when you check out at a Starbucks shouldnât deliver an hour later. From our user research, a majority of legitimate real-time use cases are for core product flows like this, where engineering is involved rather than use cases that marketing would drive autonomously.
If empowering marketing to drive these real-time use cases is crucial enough to your business to outweigh the rest of the downsides of having a consistent, sane data infrastructure, then it is justifiable to pursue an off-the-shelf CDP. The only thing Iâd urge you to beware of is that even CDPs advertising real-time capabilities cannot always achieve them.
This is because behind the scenes, most CDPs leverage off-the-shelf data warehouses like Snowflake and BigQuery as a significant part of their internal architecture. Therefore, CDPs are ultimately bottlenecked by the same technological limitations that your data team faces.
Thereâs no magic in magic, itâs all in the details -- Walt Disney
Data warehouses are becoming increasingly faster.
- JetBlue is running operational pipelines to predict flight delays with 2 minutes end-to-end latency across Snowflake & dbt
- Google BigQuery has streaming insert APIs.
Real-time capabilities are on the horizon. Snowflake, BigQuery, and Redshift all have beta features implementing incrementally computed SQL views, which is the basis of a real-time stream processing system. Materialize is building real-time streaming SQL data warehouses from the ground up and gaining significant traction.
No one can predict the future with certainty, but the industry is pointing towards a modern data warehouse being the strongest bet as your core database for customer data.
How can my data warehouse be a Customer Data Platform?
You can use tools like Hightouch to turn your data warehouse into a customer data platform.
First, Hightouch allows you to sync any data from your data warehouse into sales, marketing, and support tools with just SQL, no scripts.
Sometimes, your marketing team needs to drill into customer data to build and distribute audiences from a centralized location. In addition to the primary SQL interface, Hightouch offers an audience builder directly on top of your data warehouse.
Hightouchâs audience builder does not make any assumptions about your companyâs data model. Rather than forcing your company to mold its data model to that of a CDP, Hightouch molds itself to your companyâs data model. Hightouchâs audience builder is powered by a schema modeling layer that allows you to encode your companyâs relational object hierarchy & events by labeling tables and views from your data warehouse that youâd like to expose to business users.
This is an example of how specific business processes can be enabled on top of your companyâs customer data without losing flexibility or control. Hightouchâs audience builder is just one of many products to come that will be built directly on top of the data warehouse.
Hightouch brings you the best of both worlds. Your data team can focus on parts of the stack unique to your business -- modeling your companyâs data and answering challenging business questions. And, your marketing team can leverage customer data to run campaigns without being bottlenecked by data teams.
Curious to see Hightouch in action? Just book a demo â weâd love to show you around.
Thanks to JJ Fliegelman, Arpit Choudhury, David Beyer, Charles Wang, Mike Boyarski, Preston Johnston, and Nancy Hung for giving feedback on this article.