How to make APIs and microservices communicate safely

Most of the hotly debated use cases for data sharing are concerned with external data sharing: Sharing data with other organisations, companies, and individuals makes these available for new users and use cases. Ultimately, the assumption is that this approach will drive innovation and open new business opportunities wherever it is applied. It is for this reason that SCDS’ practice example repository focusses mainly on external use cases.

The crux is, however, that even though shared data may not be public, sharing data nevertheless requires some public endpoint - even though that endpoint may only be accessible for registered users. In other words: Shared data is not exposed, but the systems for sharing data, i.e. APIs and microservices, must have some exposure. This post therefore focusses on how you can secure the access to service via publicly accessible APIs.

The architecture of modern distributed systems and most cloud-based applications is built on microservices grouped in specific domains 1 . Naturally, the risk of external attacks makes security one of the most fundamental requirements for these systems. Each microservice with external access must be protected against vulnerabilities and malicious attacks from external users. Good engineering practices are meant to support this aim. These include the adoption of safe programming languages (e.g. Rust 2 ), the systematic testing of software against failures and misbehaviour, the use of encapsulated runtime systems as containers, the deployment of sandboxes like WebAssembly 3 (Wasm), and regular security audits.

But this post focusses on a different, more widely applicable question: How can you protect the communication between microservices? Distributed and cloud-based applications using APIs rely on the use of interfaces to ensure operations across various services. Thus, for anyone employing such systems, the question of safeguarding communications should feature front and centre.

Figure 1: Communication between services inside a specific domain

Safe communication is bugproof, not manipulable, prohibits unauthorized access, and ensures the availability of a service. To be bugproof and not manipulable, e.g. by man-in-the-middle attacks compromising integrity and confidentiality, communication must use encryption technologies like transport layer security (TLS), particularly point-to-point connections between two services using mutual TLS authentication (mTLS). Inside a domain or between multiple domains, bugproof, non-manipulable communication and its availability are typically provided by a service mesh, e.g. via Istio 4 , Linkerd 5 , Consul Connect, 6 and others (see figure 1). The service mesh optimises the routing and encrypts traffic within the domain.

But the process and architecture to safely share data beyond a specific domain, rather than sharing data within a domain, must look differently. Ideally, a safe process to access a service via an API should contain six steps as outlined in figure 2 below:

Figure 2: Securing a service via an API

  1. Before they can initiate a request to a service, users need an access token. For this, they have to identify themselves against an authorisation server (identity provider), e.g. using an enterprise solution like Keycloak 7 and Gluu 8 or a SaaS offer from providers such as Okta 9 and Google. After the authentication they can request an access token containing their claims. Ideally, this token should be a JSON Web Token 10 (JWT). The payload of the JWT contain claims, e.g. the roles belonging to a user id.
  2. If the decision is positive (true), users can execute their request through the service, using the JWT as part of the HTTP header.
  3. Primarily, a firewall or service mesh should restrict users that have access to a service via an API. But an additional line of defence is the Policy Enforcement Point (PEP) as described in IETF’s RFC 3838 11 , that should be a part of a relevant service. Before a specific function of a service is executed, the PEP checks if the request is allowed. For this, the PEP uses a set of properties from the request, the header and the token and calls a Policy Decision Point (PDP). Further, the PEP must validate the given data against the specified value ranges for each property. The PEP must also limit the rates to request resources over the API. The rates could also be approved by the PDP.
  4. The PDP evaluates these properties against policy rules, that regulate whether a call is allowed or not. The policy rules must also validate the access token as a whole, e.g. by checking its signature.
  5. The properties and the decision should be logged in a specific audit-log for the domain or specific application. This allows a security auditor later to check and analyse on demand, if suspicious or critical requests have occurred.
  6. Lastly, if the request is allowed, all parameters of the specific function also have to be validated against the defined value ranges to avoid vulnerable function calls, e.g. memory overflows that could result in security breaches.

Implementing security policies with the Open Policy Agent

The described uniform security flow for the policy admission control is the technical foundation for a common check of the security policy. This can be applied to all external requests for services across the whole system. The most interesting aspect is the implementation of the policy rules.

The Open Policy Agent 12 (OPA) recommended by the CNCF 13 offers a flexible and powerful framework and ecosystem to implement security policies. The core of OPA is the evaluation of policy expressions written in the domain-specific Rego Policy language 14 . Rego is based on the well-known declarative logic programming language Datalog. The policy rules consist of sets of logical expressions, including values, assignments, unifications, functions, universal quantifications, comprehensions, modules, and many more.

package application.authz

# The user can only update its own account info
default allow = false
allow {
 input.method == "PUT"
 some userid
 split(input.path,"/") = ["","account", userid]
 input.user == userid

Listing 1: Check the path to update account information

The example in listing 1 above returns true as decision, e.g. if a path “/account/alice” and a user “alice” is given as input. The OPA ecosystems also provides tools to test policies as usual. With these, you can provide example data and validate the expected results. Listing 2 shows an example test set.

package application.authz

test_correct_user_allowed {
 allow with input as {"path": "/account/alice", "method": "PUT", "user": "alice"}

test_wrong_user_denied {
 not allow with input as {"path": "/account/alice", "method": "PUT", "user": "bob"}

Listing 2: OPA policy tests

To unify the policy rules, the used input properties should be consistent for the whole application:

  • “path”: the absolute path of the requested URL
  • “method”: HTTP method from the request
  • “transaction”: transaction id from the request header
  • “sender”: remote address of the original client, probably from the X-Forwarded-For header of the request, if an intermediate proxy intercepted the connection
  • “token”: the bearer token from the request header, from which further properties can be extracted:
    • “iss”: issuer of the JWT token
    • “exp”: expiration time
    • “sub”: subject
    • “aud”: audience
    • “user”: user name or user id
    • “name”: full name of the user
    • “roles”: list of roles of a user
    • “scope”: desired scope

Importantly, the Rego Policy language of OPA is, with some experience, well readable and understandable for security auditors, specifically with a background in mathematics and IT. Furthermore, the policy tests allow to prove the policies using representative examples.

How to integrate OPA in a service

There are several ways to integrate policy decision points in a service. The conventional and recommended way is to use the OPA Policy Agent as a central PDP or in Kubernetes 15 as a sidecar inside the pod of your service. To ease the management of policy rules, you can package all rules in a policy bundle in the bundle.tar.gz archive. The OPA Policy Agent can then download this bundle from a central repository at startup time. The PEP can request the Data API (REST interface) of the OPA Policy Agent (PDP) to check if the input properties derived from a request to the specific service are valid. The result will be one of the two values true or false.

Other service integration options are shown in figure 3. To integrate the PDP in the PEP directly, you can use a Rego library (unfortunately, though, the Rego library is currently only available for programs written in Go). Alternatively, you can compile the policies with the OPA tool in a Wasm code bundle. The generated Wasm modules can be executed by an integrated Wasm runtime inside the PEP of the specific service. This solution allows to provide a policy module in a way that is performant and independent from the programming language. It is also a safe solution, as the Wasm runtime builds an isolated, restricted sandbox to execute the Wasm code.

Figure 3: Securing a service at the endpoint

Implementation principles

In general, the code for the PEP and an integrated PDP should be automatically generated, ideally together with the interface code from an OpenAPI 16 specification. This guarantees that the code is consistent, safe, and follows applicable coding policies, e.g. the clean code principles.

Besides the primary task to validate input data from requests against given policy rules, OPA allows to collect access logs in a central repository. These access logs should be used as a common security audit log for the whole application. However, they should be configured to not gather any sensitive data. The content should contain the input properties of requests used for the PDP as well as the decision itself. The information from the audit log should be sufficient to reconstruct security incidents. Especially, if the service’s persistent data contains its history, one can reconstruct applied changes and information read by an attacker.

The described approach systematically integrates an access control for an API. The externalisation of the policies supports a uniform management and auditing of the authorisation for all services. This reduces policy-related access management efforts and should be less error-prone. Furthermore, this approach aims to cope with high-level vulnerabilities and security risks as published by the OWASP 17 , specifically “Broken Object Level Authorisation”, “Broken User Authentication”, “Broken Function Level Authorisation”, “Security Misconfiguration”, and “Insufficient Logging & Monitoring”.

How to make APIs and microservices communicate safely
Support Centre for Data Sharing

Für Fragen und Kommentare besuchen Sie bitte unser Forum auf Futurium.