Service Mesh: definition and best practices

What a service mesh does

A service mesh is an infrastructure layer that handles network communication between services inside a Kubernetes cluster. In a microservices architecture, each service calls several others, and every call must handle authentication, encryption, retry on failure, timeout, distributed tracing, and latency metrics. Without a service mesh, every product team duplicates this code in every service, in every language used.

A service mesh externalises those concerns into sidecar proxies (typically Envoy) deployed next to each pod. Application code makes simple HTTP calls to localhost; the sidecar intercepts, enforces security policy, routing, retries, and emits metrics. Developers focus on business logic, the platform team manages the service network.

Core features

Service-to-service security. Mutual TLS (mTLS) automatically encrypts all inter-pod traffic and authenticates services. It is a prerequisite for multi-tenant environments, regulated industries (finance, healthcare) and zero-trust architectures. Without a service mesh, rolling out service-to-service mTLS takes weeks of effort per service.

Advanced routing. Canary releases, A/B testing, blue-green deployments, traffic shadowing to test a new version on real traffic without user impact. All configurable through Kubernetes CRDs, without touching application code.

Resilience. Automatic retry on transient errors, circuit breaker when an upstream service is degraded, per-caller rate limiting. Those patterns become declarative rather than hand-coded in every service.

Automatic observability. RED metrics (Rate, Errors, Duration) for every service-to-service call, distributed tracing without code-side instrumentation, structured call logs. The service-graph map is rendered in real time.

The main service meshes

Istio (CNCF Graduated 2023). The most complete and most deployed. Steep learning curve: 50+ CRDs, verbose configuration. Suited to mature organisations with 30+ platform engineers.

Linkerd (CNCF Graduated 2021). The simplest. Minimalist architecture, custom proxy (not Envoy) written in Rust, small memory footprint. Suited to mid-sized teams that want mTLS without the operational overhead.

Cilium Service Mesh (CNCF Graduated 2023). An alternative eBPF-based approach that removes the Envoy sidecar. Higher performance, but L7 features are still catching up to Istio.

When not to adopt a service mesh

A service mesh adds significant operational load: deployment, upgrades, sidecar debugging, team training. For an architecture under 10 services, that cost exceeds the benefits. Prefer a simpler approach: Ingress + cert-manager for edge TLS, OpenTelemetry instrumentation for tracing, application-level retries with a library like Resilience4j or Polly.

The practical threshold observed at Hidora clients: adopt a service mesh from 15 to 20 microservices onward, or as soon as an internal mTLS requirement is explicit (FINMA, healthcare).

Related Hidora services

Consulting: service-mesh relevance audit, selection between Istio, Linkerd and Cilium, progressive rollout strategy design.
Managed Services: 24/7 operation of service meshes in production with sidecar and policy monitoring.
Kubernetes, Cilium, Observability: associated technical building blocks.

What is Service Mesh?

What a service mesh does

Core features

The main service meshes

When not to adopt a service mesh

Related Hidora services