Slice by business capability
Draw boundaries around business domains and clear responsibilities. Technical layers such as APIs, workers, and persistence usually belong together when they serve the same capability.
English version
This page summarizes practical guidance for microservices and event-driven architecture: service boundaries, event contracts, observability, deployment, and team ownership.
Foundations
A microservice is not just a small deployable. It encapsulates a business capability, data ownership, and operational responsibility.
Draw boundaries around business domains and clear responsibilities. Technical layers such as APIs, workers, and persistence usually belong together when they serve the same capability.
Each service owns its data. Other services should not read its database directly; they should use APIs, events, or replicated read models.
Use synchronous calls deliberately, define timeouts and fallbacks, and avoid publishing internal models as public contracts.
A service should be testable, deployable, scalable, and rollbackable on its own. Keep shared libraries small and stable.
Event-driven architecture
Events decouple workflows, but they increase the need for discipline around contracts, idempotency, ordering, and error handling.
An event describes what happened: OrderPlaced, PaymentCaptured, or ContractCancelled. It is not a hidden remote procedure call.
Schemas are production interfaces. Changes must be planned for compatibility, documented, and covered with consumer tests.
Consumers must tolerate duplicate delivery. Event IDs, Aggregate IDs, Inbox/Outbox patterns, and traceable retry strategies are core building blocks.
Eventual consistency is a design decision. User interfaces, SLAs, and support processes must make delays and intermediate states understandable.
Operations
Distributed systems require operational discipline. Without observability, clear ownership, and resilience, small services quickly become large dependencies.
Carry metrics, logs, traces, and correlation IDs across services and events. Dashboards should combine business and technical signals.
Timeouts, circuit breakers, bulkheads, and dead-letter queues prevent cascading failures. Retries need limits, backoff, and clear alerting.
Automated tests, contract checks, database migrations, and progressive releases reduce risk. Rollback should be a normal procedure.
Every service needs a team, contact path, runbooks, service level objectives, and explicit rules for breaking changes.
Practical
These questions make architecture decisions concrete and keep risks visible early.