Once the objective is settled, the next question is what the system around the model has to do. ML systems carry the same requirements as any production software — they have to keep working, they have to scale, someone has to be able to maintain them — plus one that traditional software mostly avoids: they have to adapt to a world that won't sit still. The data they were trained on stops looking like the data they see in production, and they have to keep up.
Chip Huyen organizes these into four requirements: reliability, scalability, maintainability, and adaptability. Each of them looks deceptively familiar. Each of them is harder for ML systems than for the traditional software engineering they inherit the words from.
Reliability
A reliable system keeps doing the right thing under load, under bad input, under partial failures, and under the slow accumulation of weird edge cases nobody anticipated. For traditional software this is hard but tractable — when something breaks, it breaks loudly. A service throws a 500. A test goes red. A null-pointer exception lands in the logs. You can alert on it.
ML systems break quietly. A miscalibrated recommender doesn't crash — it just starts recommending the same five products to everyone. A drifted classifier doesn't throw exceptions — it just labels more and more inputs incorrectly with high confidence. The model still runs. The endpoint still returns 200s. The dashboards still tick. And the business slowly bleeds revenue while everyone congratulates themselves on a stable deployment.
This is why ML reliability is fundamentally a measurement problem. You cannot rely on the system to tell you when it's broken. You have to build the instrumentation that asks the question.
Scalability
Scalability has two faces in ML. The first is the one every web service knows: traffic grows, and a single machine stops being enough. You shard, you replicate, you load-balance. ML adds wrinkles — your request handlers carry hundreds of megabytes of model weights, and you cannot just spin up another container without paying for the GPUs to host them — but the playbook is recognizable.
The second face is unique to ML: the model itself grows. A model that fit comfortably in 8 GB of GPU memory at launch may need to be split across multiple devices a year later, after retraining on ten times the data. The serving stack you designed for the small model won't survive the big one. Scalability for ML is not just "how do I handle more requests" — it's "how do I handle a request when the model has outgrown a single machine."
Maintainability
An ML system is maintained by a wider cast than a regular service. Data engineers own the pipelines that feed it. ML engineers own the training and serving code. Subject-matter experts own the labels. DevOps owns the infrastructure. Product owns the objective. When any of them leaves and the model breaks, somebody has to figure out which of these surfaces is the problem — often without the original author around to explain.
Maintainability for ML is the discipline of leaving behind a system that someone else can understand, debug, and retrain six months from now. It is opposed by every shortcut that feels reasonable in the moment: the notebook with hardcoded paths, the dataset version that lives only on someone's laptop, the magic number in the loss function with no comment. These don't break the system today. They make it unmaintainable tomorrow.
Adaptability
This is the requirement traditional software engineering doesn't really have. A correctly-written sorting function will sort correctly ten years from now. A correctly-trained recommender, deployed a year ago, may already be wrong — because user behavior has shifted, because the catalog has changed, because a global event (a pandemic, a viral trend, a competitor's launch) rewrote the patterns the model learned from.
Adaptability is the system's ability to notice this drift and respond to it: to detect that the world has moved, to retrain on fresher data, to roll the new model out without breaking anything, and to roll back if it's worse. It's the requirement that turns ML systems into living systems — they don't ship and stay shipped. They ship and keep shipping.
Most of the rest of Designing Machine Learning Systems is the engineering of these four requirements. Chapters 3 through 5 build the data plumbing that makes reliability and adaptability possible. Chapters 6 and 7 cover the model development and deployment patterns that scalability demands. Chapters 8 and 9 are about detecting drift and continually learning — adaptability in production. Chapter 10 is the infrastructure that keeps any of this maintainable.
If the previous section gave you the what of the system (the business outcome), this section gave you the how well (the non-functional bar). The next section turns to the how — the iterative process of actually building one.