ML Sandbox

What is an ML System?

In 2016, Google replaced its phrase-based translation system with a neural machine translation model. Overnight, the quality of Google Translate improved more than it had in the previous ten years combined. It was a dramatic demonstration of what ML could do — but the model itself was only one piece of a much larger system.

Behind that model was a data pipeline ingesting billions of translated sentence pairs, a training infrastructure distributing computation across thousands of machines, a serving system handling millions of requests per second with strict latency requirements, and a monitoring system detecting when translation quality degraded.

This is the central insight of Designing Machine Learning Systems: the ML algorithm is a small part of the overall system. The data, infrastructure, deployment, monitoring, and human processes around it are where most of the complexity — and most of the failures — live.

ML Systems Overview

An ML system is much more than just the model. It includes data pipelines, feature engineering, training infrastructure, evaluation frameworks, deployment platforms, and monitoring systems. The model itself is often a small fraction of the overall codebase.

Explore the map below to see how all the components fit together and which chapters cover each one.

When to Use Machine Learning

ML is powerful but not always the right tool. Not every problem benefits from a learned model — some are better solved with simple rules, lookup tables, or human judgment. The key is knowing when each approach fits.

Chip Huyen identifies nine criteria for deciding whether a problem is a good fit for machine learning. Each criterion represents a condition that, when absent, makes ML either impossible, impractical, or unnecessary.

Explore each criterion below — click a tile to see a mini-demo that makes the reasoning tangible.

These nine criteria are not a checklist — they are a lens for thinking about whether ML will create value in your specific context. A problem that fails one criterion might still benefit from ML if the other conditions are strongly met.

Different Stakeholders and Requirements

Most ML projects involve multiple stakeholders with different — often conflicting — objectives. The ML engineer optimizes for model accuracy. Sales wants revenue. Product cares about user experience and latency. The platform team wants simplicity. The manager wants alignment.

Consider a food delivery app like RestaurantGo. The app charges a 10% service fee on each order — so recommending expensive restaurants means more revenue. But the most accurate recommendations might favor cheap, popular spots that users actually prefer.

Try the simulator below — pick a model and watch the stakeholders react.

As you can see, there is no single "best" model — only tradeoffs between stakeholders. This is why ML in production is fundamentally about negotiation, not optimization. The technical challenge of building a good model is only part of the picture. Deploying it requires aligning competing interests.

Understanding Percentiles

When we talk about latency, we often report a single number — the average. But averages can be deeply misleading. A service with 100ms average latency might have a p99 of 3000ms — meaning 1% of your users wait 30x longer than the average suggests.

This matters because your slowest users are often your most valuable: they're the ones making complex queries, browsing deeply, or processing large orders. Try the explorer below to see how averages lie.

This is why product teams specify SLOs in percentiles (p90, p95, p99) rather than averages. When you see a latency number, always ask: "Is this the average, or a percentile?" The answer changes everything.

What's Next

In this chapter, you explored the big picture of ML systems: what they are, when to use them, how stakeholders collide, and why percentiles matter more than averages. You trained a real model and watched it behave differently in research and production. You picked a model and saw five stakeholders react.

These aren't abstract concepts — they're the daily reality of deploying ML. Every decision in the chapters ahead connects back to the tradeoffs you experienced here.

Chapter 2 will dive into how to scope ML projects and set up objectives that align with business goals. Coming soon.