Data Products as Modular Components - A Flexible Framework
Why I'm Not Pedantic About Data Product Definitions
A few weeks ago on LinkedIn I posted about how I think about data products as a container for a set of modularized components. In that post I shared this diagram šš»

Not long after, an analyst on my team reached out, confused: āHow can a data product be so many different things? Dashboards, chatbots, datasetsā¦even APIs. What actually is a data product?ā
Itās a fair question. Everyoneās talking about data products, but nobody seems to agree on what they actually are. My personal take is a pretty flexible framework - itās not rigid, but I do find it useful.
Defining Data Products: My Framework Basics
When I talk about data products, Iām talking about them as containers for modularized components. A picture is worth a thousand words, so lets refer back again to that imperfect illustration of what I mean:
āļø Grey boxes = The actual data products (the whole container)
šØ Yellow boxes = What the business perceives as the data product - the usable final thing they interact with
š¦ Blue boxes = The components that help deliver that product - the behind-the-scenes infrastructure not exposed to the end user
Look, you can call the yellow box the ādata productā if you want.
Or you can call the whole grey container the ādata product.ā
I wouldnāt call the blue boxes the data product, the user doesnāt even see them.
Others will disagree. Iām not going to argue about it, because the terminology matters way less than understanding the components and how they work together. Some people enjoy arguing about this - Iām with Joe Reis - The Pedantic Layer is a distraction.
First things First: Finding the *Right* Work
So, how do you build things that get used and drive value? Like any product manager in digital SaaS - you talk to users, deeply understand their problems, and build up your roadmap to meet those needs. How exactly do you find the right ideas for your roadmap?
šš»āāļø Sometimes stakeholders approach you with a clear idea of what they want.
š¤·š½āāļø Sometimes stakeholders approach you with a vague idea they probably need data.
š”As a PM you will see recurring patterns of data needs that could likely be solved with a unified data product upstream.
ā¬ļø Critically, you must understand company priorities and proactively investigate strategic workstreams to uncover opportunities to drive business outcomes with data through operational data flows, AI/ML, or analytics.
Iāll get into discovery in future posts, but for now key this principle in mind:
All great data teams should be doing a mix of RESPONSIVE (ahem reactive) work and STRATEGIC proactively sourced work.
If the team has a product manager - you should be owning both and have methods by which you continuously discover these opportunities.
Discovery: Refining Requirements
Whether an idea comes as an inbound request or from your own pattern recognition, every idea needs vetting before building begins.
No matter how the idea surfaces, you need to clearly understand:
šÆ the business outcomes where data can be a driver
šØš½āš¼the business context - the workflows that the data will be activated within (whether thatās an automation, a dashboard, an ad-hoc self-serve reporting cube etc.)
when will they need this data?
how will they want to access it?
whatās the most intuitive way for the data to integrate into the business (alert, dashboard, chatbot)?
š©š¼ the user needs & skill levels
who are the actual people who will engage with this data?
how will you make this easy and even delightful for them to engage with?
will there need to be up-skilling to make this possibleāif so, whatās the plan for that?
Product thinking means that before we jump to āHow do we design and build this solution?ā, we first ask :
āWhat PROBLEMS should we solve firstā
āHOW can data best help solve this problem?ā
āWHAT exactly should we be building?ā
Ownership: Ambiguity is the Enemy
Every business is unique, but what is common in all medium to large businesses is there are a lot of personas and teams involved in working with data. No matter your org structure, what matters most is being crystal clear on two things:
Which team(s) have the right skills to build what is needed
What is the right way to build (architecture) this so it is maintainable, cost effective, etc
Do you have an enterprise architecture group who should weigh in?
Do you have a data architecture group in the data team who should weigh in?
Is the pattern established already because we build things like this all the time (sometimes we donāt need architectural input)?
The truth is that for many problems, there are a lot of ways we could meet the criteria in the PRD. And often multiple teams with overlapping skill sets. And if we arenāt clear on best practice and ownership we get chaos in the data warehouse.
Our architectural principles, current tech stack, skills of the team and engineering best practices often helps us clarify who should own what in an ideal state. Itās easier for engineering (and data science and analytics) to make those ownership decisions if we are clear about the requirements. In my experience factors like these often impact who should own delivery:
Latency & SLAs
Lower latency & tighter SLAs raises the stakes and narrows our technology options
This also can narrow which team members have the skills to build it
What level of incident response is needed? And how many teams would need to collaborate to resolve issues?
Architectural decisions are as much about technology as it is about decoupling cross-team dependencies to make maintenance easier
Does the criticality of the project require data quality testing and alerting?
Often implementing this needs to happen upstream of analytics, so data engineering will be needed
How will metrics be defined and synced between the various components? Should the definition be shifted left into a metrics layer or is defining them in this report okay?
If itās a narrower metric just for this report itās probably fine for analysts to do it in the BI layer
If itās a central KPI needing to be surfaced in a lot reports - might need to shift upstream to analytics eng
Does the proposed ownership model create single points of failure?
Some teams work by default with redundancy in expertise and shared ownership of code-bases
Others operate solo, may not even have code review - is that okay here?
Single points of failure are more problematic the more critical the project is
If you have a fully centralized team where all components are owned by the same group, you likely still have components of one product owned by different squads like data eng, analytics eng, data science, analyst.
If you have a hub-and-spoke model this is exacerbated as the teams operate even more independently.
Either way - whatās clear is that many people have to coordinate to design and maintain scalable data products.
Without clarity here, you get the worst kind of incidents - the ones where everyone assumes someone else is fixing it.
With clarity around the ownership of modular components we make dependencies explicit, decouple work across teams as much as possible, and create products that are well defined and resilient.
As the product manager you can play a key role confirming ownership alignment before you hand off the PRD for delivery.
Data Products Evolve Over Time
To summarize what weāve noted above:
Data Products are not just responsive - They are Strategic
We build data products in response to inbound business requests AND through impactful work we source directly in partnership with the business.A data product has been through robust discovery
Someone has thought about whether this should exist and what the ideal form is to meet the need
A product has clearly defined ownership & robust engineering under it.
Itās not held together with duct tape and prayers.
Thereās proper testing, version control, observability and we know who owns which elements - not just when itās first constructed, but over the long term. It can handle production-level demands. Youāre not terrified every time someone actually uses it. And when issues crop up (because data is unpredictable) we can manage our response with ease. Mostly. š
So thatās what it takes to build a great initial data product. But another key factor for products is you donāt just launch them and move on. Which brings me to my final point:
A product should have strong product-market fit and be under continuous monitoring for usefulness and evolution.
You donāt just build it and walk away. Youāre monitoring whether people are using it, whether itās solving their problem, whether their needs are evolving.
If itās not getting used right away, you talk to users again and investigate what the friction points are so you can give better training, change the final deliverable to make it easier to use, etc.
If it stops getting used and we talk to users and find out business needs have shifted we either:Iterate on the data product to find problem-solution fit again
Deprecate - man I love killing unused things, what a rush.
One final note:
Not every data deliverable is a product and thatās okay
Look, Iām a data product manager. My goal is that we build such great products that we meet most business needs. But there are always ad-hocs, one-offs, urgent things we need to tackle. The key is to not let those turn in long term dependencies that never underwent proper discovery (what I call product debt).
Anyway, these are the things that make a data product different, to me, than random data curations, dashboards, or modelsā¦
Itās a different way of thinking about data than just building a point solution. One project after another.
Get off the data project hamster wheel.

In Conclusion: A Cake Based Analogy on Why Data Products Can Be So Many Different Things
I could leave things here, but I have an analogy that seems to really click for people. And it involves cake.
Remember my analyst who was confused about why data products can be so many different things? She was stuck on how dashboards, datasets, APIs, ML models could all be āproductsā?
So I shared an analogy that really clicked for her. Maybe it will drive home these concepts for you too:
Imagine itās your momās birthday, and she loves carrot cake with cream cheese frosting.
You head to the store with a clear need. First stop: the bakery section.
š Stop 1: The Bakery
ā No carrot cakes available - just chocolate, vanilla, and confetti cakes. My mom likes German chocolate cake, but itās not her favorite. Maybe I can make that work?
These are like our curated dashboards. Beautiful, well-made products. Someone put real craft into them. But this time they donāt solve your specific problem. Theyāre finished products optimized for the most common use cases, but your use case isnāt common.
š Stop 2: The Baking Aisle
ā ļø No carrot cake mix either, but thereās a spice cake mix thatās close
You could adapt this - add your own carrots, tweak the recipe. This is like taking an existing data product and customizing it for your use case. Itās not exactly what you need, but it gets you 70% of the way there, and you can handle the last 30% yourself.
š„ Stop 3: Raw Ingredients
šØš»āš³ Finally, you consider buying raw ingredients
, butter, milk, eggs, carrots (pre-shredded or whole?) Guaranteed to be just the cake she wants (if my cooking skill are up to the taskā¦")
Maximum flexibility, but youāre building from scratch. This is like working with raw datasets and APIs to create exactly what you need. You have complete control, but you also have to know what youāre doing. And it takes time.
The Key Insight
Hereās the thing that clicked for her, and hopefully clicks for you too:
All of these could meet your needs. And all of them are āproductsā in their own right.
Certainly the manufacturers of them all thought of them as finished products. The bakery is proud of that chocolate cake. Betty Crocker stands behind that spice cake mix. The flour company has been making that flour for decades.
But from your current pain point - from where youāre standing with a specific need - theyāre just different levels of:
Completeness (how close to your end goal is this?)
Customization required (how much work do you have to do?)
Time to value (how fast can you get to done?)
Skill needed to use (what capabilities do you need?)
Maybe you go with that German chocolate cake this time because you are literally on your way to your momās party and itās your only option. It would make your mom just as happy - sometimes a āgood enoughā solution is the right product choice. The same applies to data products. Maybe next year youāll plan better and make that from scratch Food Network āļøāļøāļøāļøāļø recipe?
An executive dashboard, a cleaned dataset, a real-time API, raw streaming data feeds - they can all be legitimate data products depending on who the user is and what problem theyāre trying to solve.
This is why Iām not pedantic about the definition. Because the definition isnāt the point. Understanding what your user actually needs and using best practices to uncover, refine, and articulate that need so the team can build it - thatās the point.
Putting It All Together
So hereās what I want you to take away:
Data products arenāt one thing - theyāre a spectrum.
Understanding the architecture (grey/yellow/blue boxes, components and containers, ownership and engineering practices) helps you build better.
Understanding the user need (the cake analogy, the spectrum from finished to raw) helps you build the RIGHT thing with the time and resources youāve got.
Being pragmatic about definitions instead of pedantic helps you focus on what actually matters.
Most businesses have to make do with a mix of ready made generally appealing cakes, doctoring up cake mixings, and bespoke perfect chefās kiss beautiful cakes. All three together help you scale your data organization without drowning in one-off requests or building Rube Goldberg machines that nobody can maintain.
What type of ādata productā are you building right now?
Are you building true products at all, or still stuck on that service desk hamster wheel?
Are you clear on whether your stakeholders need the bakery cake, the cake mix, or the raw ingredients? Iād love to hear how youāre thinking about this.
And if this framework was helpful, subscribe for more posts on navigating the intersection of data and product.
Fair warning: there will probably be more food analogies, AI generated images, and always always always - at least one typo.






I'll take a carrot cake, no too sweet please with a nice coffee next to it š
I really liked the "Ownership: Ambiguity is the Enemy" section it's so true