As development teams continue to embrace microservices and distributed systems, observability is becoming increasingly important for service management, troubleshooting and monitoring production environments. With the rise of cloud computing, the cost of monitoring has never been lower. Yet observability solutions tend to be expensive and can be difficult to set up. OpenSearch offers an open-source alternative that promises to make observability easier than ever.

What is observability?

Observability is about how your application works from a technical perspective. It encompasses the tools and techniques that allow you to gain visibility into your systems, understand their behaviour and identify anomalies.

On one hand, observability is a new term for something that has been around for a long time. Logging has existed since computer systems have existed. But as technology has evolved from monolithic applications on physical hardware to distributed systems deployed as code on virtualized infrastructure, new challenges have emerged. Troubleshooting and debugging these modern systems require purpose-built tools.

Observability rests on three fundamental pillars: logs, metrics and traces. Logs tell you what happened. Metrics give you a quantitative view of your systems' state. Traces let you follow a request's journey through the different services in your architecture. It's the combination of these three elements that provides a complete picture of your infrastructure's health.

At Hidora, observability is at the core of our approach. Our managed services include setting up a complete observability stack for every client.

Why observability matters more than ever

The shift from monolithic to distributed architectures has made troubleshooting fundamentally harder. When a user reports a slow page load on a traditional application, you check the application server and the database. With a microservices architecture, that same request might traverse an API gateway, an authentication service, a product catalogue, an inventory service, a pricing engine and a caching layer before a response is returned. Without proper observability, identifying the bottleneck in this chain becomes guesswork.

For Swiss companies operating in regulated sectors like finance or healthcare, observability also serves a compliance function. Audit trails must demonstrate that systems behaved as expected. When regulators ask "what happened at 14:32 on March 5th," you need to produce an answer backed by data, not assumptions. OpenSearch's indexing and search capabilities make this type of forensic analysis fast and reliable.

The financial case for observability is equally compelling. According to industry studies, the average cost of IT downtime for mid-sized companies exceeds CHF 5,000 per hour. Investing in a robust observability stack that detects anomalies before they become outages delivers measurable ROI within months.

Why choose OpenSearch?

OpenSearch is a fully open, vendor-neutral standard that gives you complete observability and management of your data. It also works with your existing SIEM and analytics tools, making it ideal for large organizations that need a fast way to ingest data from multiple sources.

For developers, OpenSearch provides a standard interface for interacting with metadata on code dependencies without having to write additional code. This means they spend less time on instrumentation and more time building features. OpenSearch helps maximize developer productivity, making it an attractive prospect for any organization, regardless of size.

Key advantages of OpenSearch include:

Open source and community-driven. No vendor lock-in, no expensive proprietary licences.
Compatible with the Elasticsearch ecosystem. If you already use Elasticsearch, migration is simplified.
Extensible. Plugins, integrations and customizations are possible without limits.
Performant. Capable of handling massive data volumes in real time.

OpenSearch vs. proprietary alternatives

Many organizations default to proprietary observability platforms like Datadog, Splunk or New Relic. These tools are polished and feature-rich, but their pricing models can be punishing at scale. Datadog, for example, bills per host, per custom metric and per log volume. A mid-sized Swiss company with 50 servers, moderate log output and a handful of custom dashboards can easily face a bill exceeding CHF 3,000 per month.

OpenSearch eliminates the per-metric and per-host pricing entirely. You control the infrastructure, so your cost scales with compute and storage rather than with the vendor's pricing tiers. For organizations that value predictability in their IT budgets, this distinction is significant.

The trade-off is operational ownership. Running OpenSearch requires maintaining the cluster, managing upgrades and tuning performance. This is where a managed service provider like Hidora adds value: you get the cost benefits of open source without bearing the full operational burden.

Configuring OpenSearch

If you're deploying a new application and want it to be observable, we recommend configuring OpenSearch from the start. Setting up OpenSearch will allow you to ingest your application logs using search filters. This is essential for understanding what's happening with your application in real time, so you can react quickly when things go wrong.

If you have many different microservices running on multiple hosts, sending data from each host independently may not be practical. Centralizing log shipping from a single source means all your data is in one place, saving you headaches later when you want to analyze it.

Logging services such as Beat agent, Logstash or Fluentd can also be used alongside OpenSearch. The Jelastic certified template is created for each open-source stack mentioned (OpenSearch, OpenSearch Dashboards, Logstash). These three components are combined into a single self-clustering solution, which significantly simplifies deployment.

Production configuration tips

For clusters handling production workloads, a few configuration details make a significant difference in stability and performance. Set the JVM heap size to no more than half of available RAM and never exceed 32 GB, as crossing that threshold disables compressed ordinary object pointers and actually reduces performance. Enable shard allocation awareness based on availability zones so that primary and replica shards never reside on the same physical node. For write-heavy workloads typical of observability pipelines, increase the index.refresh_interval from the default 1 second to 30 seconds — this reduces I/O pressure substantially without meaningfully affecting dashboard freshness.

Ingesting your data

You can ingest data into OpenSearch with many tools, including Logstash. Logstash is an event and log management application. Although it was originally created by Elasticsearch, it now supports other products such as Apache Kafka and Amazon Kinesis.

Logstash ingests data from almost any source using various methods, including TCP/UDP sockets and file system connectors (for example, S3 or FTP). Once your data is in Logstash, you can run simple or complex queries for better visibility into problems or trends in your application environment.

Logstash's flexibility is one of its greatest strengths. Whether your data comes from Docker containers, physical servers, cloud applications or IoT devices, Logstash can ingest and transform it before sending it to OpenSearch.

Creating your first dashboard

Create your first real-time dashboard by ingesting data from OpenSearch. Start by logging into your OpenSearch account and selecting a collection where you want to view metrics. Then add a search to a new or existing application you're interested in, such as Kubernetes.

In Kubernetes, add labels for each key metric collected, such as CPU and memory usage. OpenSearch Dashboards lets you create custom visualizations, trend charts and comprehensive dashboards that provide an instant overview of your infrastructure's health.

A well-designed dashboard should answer your team's most frequent questions at a glance. We recommend organizing dashboards into three tiers: an executive overview showing service health and SLA compliance, a team-level view displaying deployment frequency and error rates, and a detailed debugging view with log streams and trace waterfalls. This tiered approach ensures that each stakeholder, from the CTO to the on-call engineer, finds the information they need without wading through irrelevant data.

Adding alerts

You can easily add alerts to OpenSearch, allowing Ops teams to create notifications based on specific events. For example, if an application fails to start, an alert can be generated and sent via email or Slack.

OpenSearch comes with a set of simple rules for quickly finding failing instances. But to go further, you can configure OpenSearch so that if an instance fails multiple times within a given period (for example, three failures in 15 minutes), it automatically triggers a corrective action. This way, your application continues to run efficiently while reducing costs.

Effective alerting follows a few key principles. First, avoid alert fatigue by only notifying on actionable conditions. An alert that fires 50 times a day and gets ignored is worse than no alert at all. Second, use severity levels to distinguish between warnings (investigate during business hours) and critical alerts (wake someone up). Third, include context in the alert itself: the affected service, the metric value that triggered it, a link to the relevant dashboard and a suggested first step for investigation. These practices reduce mean time to resolution (MTTR) significantly.

Deploying OpenSearch with Hidora

At Hidora, we've developed a PaaS template that lets you get up and running quickly. In just a few minutes, you can deploy your own fully functional OpenSearch instance. Our consulting team can also help you implement a comprehensive observability strategy tailored to your specific needs.

The deployment process is straightforward. From the Hidora dashboard, you select the OpenSearch template, choose your cluster topology (single node for development, multi-node for production) and configure resource limits. The platform handles the rest: provisioning, networking, security groups and initial configuration. For production deployments, we recommend at minimum three data nodes and dedicated master nodes to ensure high availability and resilience against node failures.

Observability is not a luxury, it's a necessity for any organization operating production systems. With OpenSearch and Hidora's support, you can set up a robust, scalable and cost-effective solution.

OpenSearch: the secret to better observability