The Build vs Buy vs Partner Decision for ML
A decision framework with concrete cost models for when to hire an ML team, buy SaaS, or bring in a consulting partner.
The Three Options
Every company at some point faces this decision: we need ML capabilities. Do we build an in-house team, buy a SaaS solution, or partner with a consulting firm?
The answer depends on three things: how core ML is to your competitive advantage, your budget and timeline, and the complexity of your problem. Here's the framework we use with our clients.
Option 1: Build In-House
What it looks like: Hire ML engineers, data engineers, and MLOps engineers. Build your own infrastructure. Own everything.
The Real Cost
Most companies dramatically underestimate the cost of building in-house ML capabilities.
Year 1 costs (US market):
- Senior ML Engineer: $180-250K (salary + benefits)
- ML Engineer: $140-190K
- Data Engineer: $150-200K
- MLOps Engineer: $160-210K
- Infrastructure (GPU compute, storage, tools): $50-150K
- Recruiting costs: $60-100K (20-25% of first-year salary per hire)
Minimum viable ML team: 3-4 engineers = $500-800K/year before they ship anything.
Time to first production model: 6-12 months (including hiring, onboarding, infrastructure setup, data preparation, model development, deployment).
When to Build
Build in-house when:
- ML is your core product or primary competitive advantage
- You have >$1M annual budget for ML
- Your timeline is 12+ months
- You need continuous model iteration and domain expertise accumulation
- You're willing to invest in infrastructure and tooling
When Not to Build
Don't build when:
- You need results in less than 6 months
- The ML component is supporting, not core
- You can't attract and retain ML talent in your location/compensation range
- You don't have the data infrastructure to support ML
Option 2: Buy SaaS
What it looks like: Subscribe to an ML-powered SaaS product. Someone else handles the models, infrastructure, and maintenance.
The Real Cost
SaaS ML products typically charge per prediction, per user, or per data volume:
- Document processing: $0.01-0.10 per page
- Image classification: $0.001-0.01 per image
- NLP APIs: $0.01-0.05 per 1K tokens
- Predictive analytics platforms: $2-10K/month
Year 1 cost: $24K-120K depending on volume and complexity.
When to Buy
Buy SaaS when:
- The problem is well-commoditized (OCR, basic sentiment, standard object detection)
- Speed to market is critical (need results in weeks, not months)
- You don't need to own or customize the model
- The data can leave your infrastructure (no regulatory constraints)
- Volume economics make sense at your scale
When Not to Buy
Don't buy when:
- Your problem requires domain-specific training data or custom models
- You need to run inference on-premise or on edge hardware
- Vendor lock-in is unacceptable
- Your data can't leave your infrastructure (defense, healthcare, financial services)
- You need sub-10ms latency that cloud APIs can't provide
Option 3: Partner (Consulting)
What it looks like: Engage an ML consulting firm for scoped engagements. They bring the expertise, you retain the IP.
The Real Cost
ML consulting ranges widely:
- Freelancers: $100-200/hr ($16-32K for a 4-week project)
- Boutique firms: $150-400/hr ($24-64K for a 4-week project)
- Big 4 / enterprise consultancies: $300-600/hr ($48-96K for a 4-week project)
- Value-based pricing: $5-50K per project (our model)
Year 1 cost for ongoing engagement: $60-300K depending on scope and model.
When to Partner
Partner when:
- You need production ML but can't justify a full-time team
- The project requires specialized expertise you don't have
- You need results in 4-12 weeks
- You want to retain ownership of models and IP
- You're evaluating whether ML is worth deeper investment
When Not to Partner
Don't partner when:
- You need long-term, continuous model development (a retainer can work, but in-house is eventually better)
- The consulting firm wants to own the IP
- The engagement scope is unclear
- The consulting firm has no production deployment experience
The Decision Framework
Answer these five questions:
1. Is ML core to your competitive advantage?
- Yes → Lean toward Build (but maybe Partner first to accelerate)
- No → Buy or Partner
2. What's your timeline?
- < 3 months → Buy or Partner
- 3-12 months → Partner or Build
- 12+ months → Build
3. What's your annual ML budget?
- < $50K → Buy
- $50-500K → Partner
- $500K+ → Build or Partner + Build
4. Can your data leave your infrastructure?
- Yes → All options viable
- No → Build or Partner (no SaaS)
5. How custom is your problem?
- Standard (OCR, sentiment, basic classification) → Buy
- Domain-specific but well-defined → Partner
- Novel and continuously evolving → Build
The Hybrid Approach
In practice, the best approach is often a combination:
Phase 1: Partner — Engage a consulting firm for a scoped sprint ($5-20K, 4-8 weeks). Get a production model deployed, learn what works, validate the business case.
Phase 2: Evaluate — With a working system, you now know the real requirements, challenges, and value. This informs the build vs buy decision with actual data, not assumptions.
Phase 3: Scale — Either grow the partnership into a retainer, transition to in-house by hiring (using the consulting firm's architecture and code as a foundation), or switch to a SaaS solution if one fits.
This is exactly how we structure our engagements at Opulion. Start with a low-risk sprint, prove value, then scale in the direction that makes sense.
The Hidden Costs to Watch
Regardless of which option you choose, watch for these hidden costs:
- Data preparation: 60-80% of ML project effort, regardless of who's doing the work
- Integration: Connecting the ML system to your existing infrastructure
- Maintenance: Models degrade over time and need retraining
- Monitoring: You need to know when the model stops working
- Opportunity cost: Time spent on ML is time not spent on other priorities
The cheapest option isn't always the best option. The best option is the one that gets a working, monitored, maintainable ML system into production fastest — because that's when you start learning whether ML actually moves the needle for your business.
Discussion (2)
We spent 18 months and $800K building an internal ML platform for medical image analysis. It works... in our dev environment. Getting it through FDA validation and into production is a completely different beast that our team isn't equipped for. The 'build trap' you describe is exactly where we are. Wish I'd read this 18 months ago.
The 'build trap' in regulated industries is particularly painful because the regulatory/compliance layer is 60% of the production work, not 10% like most teams budget for. The model is the easy part. FDA validation, audit trails, explainability requirements, data lineage — that's where internal teams get stuck because it's not ML work, it's systems engineering + compliance work. At this point, the most cost-effective path is usually to partner with someone who's navigated the regulatory deployment before, and focus your internal team on the domain-specific model improvement where their clinical knowledge is the real asset.