Why We Built an Operational Data Store Instead of Making HubSpot Do Everything

May 2, 2026

Liam Weedon

Featured illustration for Operational Data Store vs CRM

Every team I work with starts the same way. HubSpot is the centre of everything. Deals, contacts, companies, tasks, emails, enrichment data, custom objects for tracking campaign performance, more custom objects for partner data, maybe another one for product usage events. The CRM becomes the database, the reporting layer, the integration hub, and the system of record all at once.

It works until it does not. And when it stops working, it stops all at once.

We hit this wall ourselves. Running enrichment data through HubSpot custom objects, storing webhook event logs in custom properties, building increasingly complex workflows to keep everything in sync. The API rate limits started biting. The custom object costs started climbing. And the data model became so tangled that changing one thing broke three others.

So we built an operational data store. A Postgres database that sits underneath everything, stores the data HubSpot should not be responsible for, and lets HubSpot do what it actually does well: manage relationships and run sales processes.

Here is why we made that call, what the architecture looks like, and what it changed.

What HubSpot is good at (and what it is not)

HubSpot is excellent CRM software. It handles contact and company records well. The deal pipeline is solid. Sequences and workflows automate sales and marketing processes reliably. The reporting, while limited compared to a proper BI tool, covers most day-to-day needs.

What HubSpot is not good at: being a database.

The moment you start storing operational data in HubSpot, things you need for enrichment, automation state, event logs, cached API responses, you are fighting the platform. Custom objects have row limits on lower tiers. Properties have type constraints that do not map cleanly to operational data. The API has rate limits that punish heavy read/write patterns. And the pricing model charges you more as your data volume grows, which is the opposite of what you want from a data store.

The specific problems we ran into:

Rate limits. HubSpot's API allows a set number of requests per 10-second window. When you are running enrichment workflows, syncing data from multiple sources, and querying the CRM for reporting, you hit that ceiling fast. We were batching and throttling API calls just to avoid errors, which slowed everything down.

Custom object costs. Storing enrichment cache data in HubSpot custom objects works technically, but the cost scales with volume. Every cached domain, every enrichment result, every webhook event log adds to your object count. On Enterprise tier, this gets expensive quickly.

Data model rigidity. HubSpot properties are typed (text, number, date, dropdown). Operational data is often JSON blobs, nested structures, or arrays. Storing a multi-provider enrichment result in a HubSpot text property means serialising it as a string and parsing it on every read. That is fragile and slow.

No query language. You cannot run SQL against HubSpot. You get the CRM search API, which is good for finding contacts and deals, and useless for analytical queries like "show me all domains enriched in the last 30 days where the employee count changed by more than 20%."

What an operational data store actually is

It is a database. That is it. Not a data warehouse, not a data lake, not a CDP. A managed Postgres instance that stores operational data your CRM should not be responsible for.

The specific things that live in ours:

Enrichment cache. Every domain and contact we enrich gets cached with the result, provider, timestamp, and TTL. This is the enrichment cache that saves 40-60% on lookup costs. It does not belong in HubSpot because it is high-volume, needs fast reads, and the data structure is too complex for CRM properties.

Webhook event logs. Every inbound webhook from HubSpot, Stripe, Clay, or any other tool gets logged as a JSON row with a timestamp. This gives us a complete audit trail and the ability to replay events if something breaks. Storing this in HubSpot is not practical.

Unified company and contact graph. A normalised view of every company and contact we track, regardless of which tool they came from. HubSpot has its version. Clay has its version. Our enrichment providers have their versions. The data store reconciles them into a single record per entity.

Automation state. When a multi-step workflow runs across several tools, the current state (what has been done, what is pending, what failed) lives in the data store. This means we can resume interrupted workflows, retry failed steps, and audit what happened without digging through multiple tool logs.

The architecture: HubSpot stays, it just stops being the centre

This is not about replacing HubSpot. It is about right-sizing its role.

Before and after architecture diagram comparing CRM-centric with HubSpot at centre versus ODS-centric with Supabase operational data store at centre

HubSpot remains the CRM. Sales reps work in HubSpot. Deals live in HubSpot. Sequences and workflows that touch the sales process run in HubSpot. Nothing changes for the end user.

What changes is what sits behind HubSpot. Instead of HubSpot connecting directly to every enrichment tool, every webhook source, and every reporting query, those connections go through the data store.

The flow looks like this:

Enrichment. Clay enriches a domain. The result writes to the data store. A sync process pushes the relevant fields (the ones sales reps actually need) to HubSpot contact and company properties. HubSpot gets clean, pre-processed data. The raw enrichment result, the cache metadata, and the provider attribution stay in the data store where they belong.

Webhooks. An inbound event from Stripe (new subscription, payment failed, upgrade) hits a serverless function. The raw event logs to the data store. A processing function extracts the relevant data and updates the appropriate HubSpot records. If the processing fails, the raw event is still logged and can be reprocessed. No data loss.

Reporting. Day-to-day CRM reporting stays in HubSpot. Operational reporting (enrichment cache hit rates, webhook processing times, data freshness metrics) runs against the data store using SQL. An AI agent can query both and combine the results into a single picture.

AI agent access. This is the big one. The AI agent connects to the data store via MCP and can query anything. Enrichment history, event logs, automation state, the full company graph. It also connects to HubSpot for CRM data. The combination gives it a complete view that neither system provides alone.

What this actually costs

Less than you think.

A managed Postgres instance on a free tier handles a surprising amount of operational data for a small team. We ran on the free tier for months before needing to upgrade. Even on paid tiers, the cost is typically $25-50/month for the data volumes a small B2B team generates.

Compare that to the HubSpot costs we avoided. Custom object limits, API add-ons, and the general price escalation that comes with pushing more data into a CRM that charges by volume. The data store paid for itself in the first month just from the enrichment cache savings.

The setup cost is time, not money. If you are comfortable with managed database services, the initial setup takes an afternoon. Create the tables, set up the serverless functions for cache lookups and webhook processing, configure the sync to HubSpot. If you are not comfortable with databases, an AI agent can generate the schema, the functions, and the migration scripts. You review and deploy.

What compounds and what does not

The data store gets more valuable over time for the same reason the enrichment cache does. Every piece of data you store increases the value of future queries. Every webhook you log improves your ability to debug and audit. Every enrichment result you cache saves money on the next lookup.

What does not compound: the sync logic between the data store and HubSpot. That stays roughly the same complexity regardless of data volume. You define which fields sync, in which direction, and how conflicts are resolved. It is a one-time setup that needs occasional maintenance, not a compounding asset.

The other thing that does not compound automatically is data quality. The data store stores what you put in it. If your enrichment providers return bad data, you now have bad data cached efficiently. Build in quality checks, confidence scores, and regular audits of what the cache is serving.

When to make the switch

You do not need an operational data store on day one. If your team is small, your enrichment volume is low, and HubSpot is handling everything comfortably, stay where you are.

The signals that it is time to build one:

You are hitting HubSpot API rate limits regularly. You are storing operational data in custom objects and the costs are climbing. You are serialising JSON into text properties because HubSpot does not support the data structure you need. You want to run analytical queries that HubSpot's reporting cannot handle. You are building an AI-first operations stack and need the agent to query operational data directly.

If any of those sound familiar, the data store is the next step. Start with the enrichment cache (immediate ROI), add webhook logging (audit trail and reliability), then expand to the full company graph and automation state as your needs grow.

The goal is not to replace your CRM. It is to let your CRM be a CRM, and put everything else somewhere purpose-built.

This post is part of a series on building AI-first operations. Related: Building a Clay Enrichment Cache That Saves 40-60% on Lookups, What MCP Actually Means for Business Operations.



