A customer data platform (CDP) is a collection of software which creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned and combined to create a single customer profile. This structured data is then made available to other marketing systems.
Sounds useful, right? It does! If you’ve ever had to deal with customer data at a company, regardless of size or your business department (sales, marketing, support, engineering, analytics, etc.), the value proposition of a CDP is clear. Building and operationalizing a single view of the customer is hard. In fact, it’s only been getting harder.
As a (former) early employee at Segment and more broadly, someone has been twiddling the lines of marketing and data communities for over half a decade, I find CDPs intriguing. In the marketing community, an all-in-one platform to solve our countless data problems sounds like the holy grail. In the data community, we’ve been trying to do this all along using the data warehouse. Yet, most people don’t make the connection between the two.
In this post, I’ll make my case for why your CDP should not be an off-the-shelf vendor but instead, the technology your data team has been investing in all along— the data warehouse. Before that, it’s only fair to our fellow CDP vendors to provide a bit of context on the market and players. If you’re already familiar with the CDP space and are ready to hear our argument, feel free to skip ahead to the pros/cons.
What is a Customer Data Platform?
First off, CDPs aren’t just some new kids on the block. They should be taken very seriously. As seen below, the industry and term has been consistently growing over the last 5 years.
Buyer demand follows this trend, too. The overall combined revenue of players in the CDP industry is north of $2 billion as of 2019. 
Types of Customer Data Platforms
The CDP players can be broken down into a few major categories
General purpose Customer Data Platforms
The leading players here today are Segment, mParticle, and Treasure Data. Next up, there’s a number of runner-ups “doing solid business” like Simon Data, Lytics, Blueshift, and Redpoint Global.
Vertical Customer Data Platforms
These are CDPs that target a very specific type of company (industry, size, maturity, SaaS tools in place, etc.) and solve specific problems.
Conglomerate Customer Data Platforms
These are big companies like Adobe, Salesforce, and Microsoft that have started calling their existing CRMs or “marketing clouds” CDPs.
Confusing Customer Data Platforms
And last but not least, confusing CDPs. These are companies that don’t look like the rest of the CDPs but read a Gartner report about how the CDP industry is exploding and lucrative… and just had to give it a shot.
Intercom? Really? Yes, even the customer communications (live chat) platform tries their hand at “CDP” from time to time.
The image above shows sponsors of CDP Institute, leading CDP research firm. Almost every one of them considers themselves a “Customer Data Platform” in one way or another. There’s too many CDP vendors to count these days. This makes the space rather difficult to maneuver as a newcomer.
In the rest of this article, we’ll focus on the general-purpose CDPs like Segment Personas, mParticle, and Treasure Data. They’re what most people think of when they hear CDP and they have far more companies using them by sheer count than the rest of categories. As a (former) early employee at Segment, that’s also the cutout of the space that I have the most experience with.
What are the parts of a Customer Data Platform?
All CDPs have a few common components:
Data ingestion. Since CDPs are databases of customer data, they need a way to ingest data. Most CDPs achieve this via an API for developers to track traits about users and events that they’re taking across your applications.
Identity Resolution. CDPs build and maintain graphs of user profiles so that all user identifiers (cookie, IDFA, device ID, etc.) can be mapped back to a “single user ID”. Most CDPs implement a simple deterministic algorithm for identity resolution. These algorithms are functionally similar to the queries that your analyst team has already written to do marketing attribution in SQL, e.g. joining across “anonymous” and “known” user profiles using a handful of identifiers. Some CDPs, however, do identity resolution probabilistically, which is not straightforward to do in-house without a skilled data science team.
Audience builder. This is perhaps the most necessary component of a CDP. Without an audience builder, a CDP is just “Customer Data Infrastructure”. The audience builder is an interface for marketers to create customer segments without SQL and sync them to various marketing and advertising platforms to run targeted campaigns.
Outside of these core components, some CDPs have additional features for marketers, like cross-channel orchestration, predictive audiences, etc.
Why should my Customer Data Platform be the data warehouse?
If you’re thinking “Wow! This all sounds pretty awesome. Why would I want to use a data warehouse, something built for analytics, as my CDP instead?”, this section is for you.
Thoroughness of Customer Data Platforms
The data warehouse has all your data. Whether you’re a D2C brand, B2B SaaS company, e-commerce marketplace, or even a massive bank like Capital One, chances are, your customer data is already in a data warehouse. It may sound like an oversimplification, but the #1 reason that your CDP should be the data warehouse is because it exists. If you’re reading this article, I’m assuming you haven’t yet purchased or implemented a CDP or your current CDP doesn’t get the job done.
It’s easier than ever to centralize all your data in a warehouse using SaaS platforms like Fivetran, Stitch, etc. and even direct integrations by vendors. Most companies do not think of their data warehouse as being a platform for more than analytics, but companies with modern data stacks have been building operational data pipelines off of their warehouse for years. With the latest generation of cloud data warehouses (e.g. Snowflake, BigQuery, Redshift), data warehouses are turning into “data clouds” and are, thus posed to be the central nervous system of your customer data stack.
If you’re a marketer in Iterable, account manager in Salesforce, or success manager in Zendesk and are missing customer data you need in your tool, it’s probably in the warehouse. Reverse ETL solutions like Hightouch let you easily push data to business tools from your warehouse with just SQL, no scripts.
Data teams and Customer Data Platforms
CDPs target marketing teams and sell to CMOs. Ultimately, marketers are not the right persona to solve the data problems that CDPs address. Marketers generally do not have the necessary understanding of a company’s data model complexities or data skills to reason through concepts like identity resolution or complex boolean logic. It’s all about perspective-- what seems simple to a data person can be complex to a marketer.
Self-service access to data, and more broadly, data democratization are both important, but it’s a cross-functional effort. Data teams should be responsible for understanding your company’s data model and building clean data models for everyone else to consume. Marketing teams should be empowered to analyze customer behavior and iterate on customer segments for campaigns without being bottlenecked by data teams.
There can be a happy balance between marketing and data teams, but only by separating concerns in a way that lets each team do what their best at. CDPs and most other products that are “built for the technical marketer” turn data concerns into marketing concerns and marketing concerns into data concerns. This results in data integrity issues as well as an all-in-one solution that’s too complex for your everyday marketer yet too limited for your data team.
Flexibility of Customer Data Platforms
CDPs are built around rigid data models. Take Segment Personas as an example. There’s only two core objects-- users and accounts. And, users can only belong to a single account.
In reality, companies’ data models are unique and not so cookie-cutter. Users can be in multiple accounts. Accounts can have sub-accounts, business units, etc. Apart from users and accounts, companies have custom objects with their own totally proprietary hierarchy.
B2B companies like GitHub have organizations, repositories, issues, pull requests, etc. And, that’s just in their app without considering Salesforce/CRM, Zendesk/support tools, etc.
B2C companies like Amazon have users, carts, subscriptions (Prime, Audible, etc.), sellers, orders, returns, gift cards, search history, and global product inventory. The list goes on…
When it comes to the limitations of CDPs, I think back to my time as an engineer at Segment building the Personas product. We were unable to effectively “dogfood” our own product due to shortcomings in the data model, like not being able to handle users in multiple workspaces (accounts). As a result, I would frequently have to reach for SQL to run queries based on the state of a user or account at Segment. This is because the data warehouse is a full-fledged relational database that is able to model complex hierarchies.
Limitations of the Events Model of Customer Data Platforms
The CDP ecosystem’s response to custom data is “events”. CDPs allow you to send them a stream of custom events performed by your users. This sounds great in theory, but events are not so straightforward to query and often don’t tell the full picture. Here’s a couple examples:
Let’s imagine I’m a marketer at GitHub.com and want to run a campaign targeting organizations that are using more than 3 webhook integrations about our new “GitHub actions” product.
It’s much easier to reason about a query that finds organizations that have at least 3 is_active = true webhook integrations for all the repositories in an organization than to try to determine how many webhook integrations are currently active from a stream of “Added Webhook Integration” and “Removed Webhook Integration” emitted by various users within an account.
Data ownership in Customer Data Platforms
CDPs offer restricted access to your customer data, whereas a data warehouses offer unrestricted access to your data. The best companies recognize that their ability to leverage customer data is a competitive advantage. Therefore, they should own their data.
CDPs only expose very specific actions on top of your customer data, generally purpose-built for marketing workflows. Since CDPs are all-in-one solutions, you’re locked in and subject to the reigns of your CDP vendor in terms of how you can use your customer data. There’s no such thing as a smooth transition from one CDP to another. With the advent of the cloud, there’s no reason that your company’s business workflows should be tied to its data plane.
And, this is just from a functionality-perspective. With the rise of regulation and concerns around data privacy (GDPR, CCPA, etc.), data residency (e.g. Privacy Shield), and data security (SOC2, ISO, HIPAA, etc.), there is no truly on-premise CDP offering *.
*: There’s RudderStack, which is a solid event collection layer, but it lacks marketer features and is more of a developer tool than a CDP.
Ecosystem Lock-in of Customer Data Platforms
Because CDPs own your data, they own your ecosystem. Each CDP has to build their own independent “ecosystem”. After data collection, there’s still a number of concerns remaining, like quality assurance, observability, transformations, discovery, etc. Because CDPs are all-in-one suites with their own proprietary ecosystem, every CDP has to independently address these concerns via in-house product features. In most cases, CDPs do not address all these data concerns effectively, as they’re focused on building features that appeal to the marketing department. Marketing alone does not have the technical capacity to evaluate these deeply technical concerns, but they do have the business leverage to be severely affected by them.
Even in a perfect world where a CDP does address all of these concerns, you would have to use a separate set of tools to solve these concerns again for the data warehouse since CDPs do not replace all use cases for data warehouses.
On the other hand, the ecosystem around data warehouses is growing rapidly. Data warehouses are the standard that every vendor in SaaS is thinking about. With the rise of cloud data warehouses, there’s been a rise of solutions to address various post-processing concerns.
Here are just a few examples:
Data integration platforms: Fivetran, Stitch, Xplenty, Matillion, etc.
Transformation: There’s “new age” solutions like dbt and Dataform. Then, there’s older generation solutions like the Hadoop and Spark stacks, Informatica, etc.
Observability/QA: Monte Carlo Data, Great Expectations
Metadata: Alation, Collibra, Lyft’s Amudson, etc.
No single vendor, even a software giant like Salesforce or Adobe, is posed to build best-in-class software that addresses each of these concerns. Just compare the number of data sources on leading CDP Simon Data compared to that of Fivetran, the leading solution to replicate data from SaaS services and databases into your data warehouse.
What is the business impact of this?
Let’s take a fairly common example. What happens if you accidentally emit bad events to a CDP? CDPs have surface-level solutions like Segment Protocols to enforce fixed schemas around events. In a data warehouse, that sort of schema management is table stakes, but schema management isn’t everything. You can still send data that is semantically incorrect to a CDP. The events and properties that you’re collecting today may not make much sense tomorrow.
If you accidentally send a bunch of incorrect events to a CDP, it’s not easy to undo it. You often have no choice but to contact support to fix them. With a data warehouse, you own your data so you can always use raw SQL to UPDATE the data. You can also use tools like dbt to encode and execute these transformations systematically, and assertion frameworks like Great Expectations to ensure that there’s no similar slip-ups in the future.
Data Integrity in Customer Data Platforms
CDPs claim to be the single source of truth, but CDPs do not replace data warehouses. There’s nothing about having separate databases of customer information for different departments that spells “single source of truth”. Some CDPs have features to import data from the data warehouse into a CDP, but then, you introduce more data latency and no longer have the same “data freshness” guarantees advertised by CDPs.
Even if that’s okay, you cannot reuse your queries or definitions between your data warehouse, which is the basis for your analytics function and marketing automation. In the optimistic case, this is okay. In practice, we’ve talked to a number of companies that have moved from CDPs to warehouse-based approaches due to rampant data inconsistencies.
When does an off-the-shelf CDP make more sense?
It would not be fair to CDPs if we didn’t talk about when it does make sense to choose them. Despite not believing CDPs are the “be all, end all” to customer data, there are cases where it does make sense to use a CDP.
Vertical Customer Data Platforms
Vertical CDPs are CDPs that are built for a specific type of company, categorized by industry, size, purpose, etc. Contrary to general-purpose CDPs, I’m actually very bullish on vertical CDPs.
My two favorite examples of vertical CDPs are Amperity & Zaius.
Amperity is a vertical CDP built for larger, traditional retail companies. They focus on first-party probabilistic identity resolution, aka consuming loads of data from disparate data sources (often from different businesses altogether) and making a “best educated guess” of what a single person is, what a household is, etc. That’s not at all an easy problem, and it’s very valuable to a certain type of company yet totally useless to a large majority of companies.
Zaius is a vertical CDP built for the mid-market e-commerce stack. Zaius can ingest data from the most common SaaS services used by e-commerce companies, starting with Shopify, and integrates with the most common destinations, namely Klaviyo and social ad networks.
Companies using vertical CDPs still have data warehouses for analytics. In fact, many companies using vertical CDPs still end up using warehouse-based solutions for certain data problems in their stack since as they scale, the data warehouse is the only place with all their data. A handful of large enterprises that we’ve talked to actually just use Amperity for the identity piece and build their own “activation layer” off the data warehouse.
No-code capabilities of Customer Data Platforms
Despite our belief that identity is better solved by data teams, CDPs do give marketers superpowers. The average marketer isn’t suited to solve problems like identity resolution unless they’re well-versed in SQL. I’d argue that this is a good thing, and that line should be respected.
That said, if you do not have access to someone with SQL skills to model your company’s data, it might make sense to settle for an off-the-shelf CDP.
Customer Data Platforms do not require a modern data stack
If your company has a subpar data stack and doesn’t intend to improve it in the near-term but is severely bottlenecked on the marketing side, then it may make sense to use an off-the-shelf CDP. The warehouse-based approach is only as good as the data warehouse itself.
If your data stack is decent but you do not have enough data resources on your team to setup a warehouse-based stack, I’d urge you to consider solving the resourcing problem. Lots of companies fail to implement CDPs. Half the challenge of adopting any software is human. It’s very difficult to build a database of all your customer information without having someone on your team that can maneuver the intricacies of your company’s data model.
Realtime capabilities of Customer Data Platforms
Certain CDPs do have realtime capabilities that are somewhere between “challenging” and “impossible” to achieve with data warehouses alone today.
In most business cases, true realtime capabilities frankly aren’t necessary. That being said, there are certain use cases, where executing operations in near real-time is valuable. For example, a transactional notification like “Thanks for making a purchase” when you check out at a Starbucks shouldn’t deliver an hour later. From our user research, a strong majority of legitimate realtime use cases are for core product features, where engineering is involved to instrument them rather than use cases that marketing would drive autonomously.
If empowering marketing to drive these use cases in realtime is crucial enough to your business to outweigh the rest of the downsides of having a consistent, sane data infrastructure, then it is justifiable to pursue an off-the-shelf CDP. The only thing I’d urge you to beware of is that even CDPs advertising realtime capabilities cannot always achieve them.
This is because behind the scenes, most CDPs leverage off-the-shelf data warehouses like Snowflake and BigQuery as a significant part of their internal architecture and are therefore, bottlenecked by the same technological limitations as your data team faces.
TLDR-- no one can predict the future with certainty, but the industry is pointing towards a modern data warehouse being the strongest bet as your core database for customer data.
How can my data warehouse be a Customer Data Platform?
Platforms like Hightouch can turn your data warehouse into a customer data platform.
First, Hightouch allows you to sync any data from your data warehouse into sales, marketing, and support tools with just SQL, no scripts.
Sometimes, your marketing team needs to be able to drill into customer data to build and distribute customer segments without SQL. In these cases, Hightouch creates an audience builder directly on top of your data warehouse
Hightouch’s audience builder is powered by our schema modeling layer that allows you to encode your company’s object hierarchy & events by labeling tables and views from your data warehouse that you’d like to expose to business users.
This is an example of how specific business processes can be enabled on top of your company’s customer data without losing flexibility or control. Hightouch’s audience builder is just one of many products to come that will be built directly on top of the data warehouse.
Curious to see Hightouch in action? Just book a demo — we’d love to show you around.
This post is not meant to say that CDPs are on their way out. The CDP industry is estimated to be at $2.4 billion total revenue across all vendors , and we foresee this only growing. In fact, we suspect that cumulative revenue will surpass $10B revenue in the next 5-7 years.
That being said, we fundamentally do not believe that CDPs are the long-term solution to customer data problems that companies face at mass. We have strong conviction that the data warehouse and more so, emerging “data clouds” like Snowflake are and should be the central nervous system of customer data at your company.
If you’re interested in leveling up your customer data stack by operationalizing your data warehouse, check out Hightouch. We love talking about
Interested in leveling up your customer data stack? Check out Hightouch.
If you’re interested in leveling up your customer data stack or operationalizing your customer data, just book a demo.
Ready to leverage your customer data?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.