Skip to main content

API Guidance: An overview of API technologies Part 1

Application Programming Interfaces (APIs) are an integral part of modern distributed systems – and an essential link in the data supply chain for modern, data-driven organisations. They are, thus, indispensable for the data sharing ecosystem that the Support Centre for Data Sharing seeks to promote. In this three part series – accompanied by eLearning modules – we want to provide concentrated and practical knowledge to those who are thinking about developing and deploying an API, and thus need to understand the basics of this technology, as well as for those who are looking for more information on what to consider in the implementation process.

This content will be published in two ways:

  • Here, as episodic website-content, with a new instalment every about two weeks, and
  • after the series’ completion, as a single document that you can download and read offline.

The guide is available in English, French and German. Each instalment is followed by eLearning modules, covering their respective topics – and helping you to test your knowledge. 

The chapters in the first instalment are:

  • Introduction to APIs
  • Underlying API concepts
  • APIs and their architectural evolution

The second instalment will cover:

  • API types and their practical implications
  • Application scenarios
  • API Documentation

Stay tuned and don’t forget to discuss the materials by commenting on each instalment, or with your peers in our forum.  

 

1. The challenge of sharing data between machines

Data can be exchanged in many different ways. Where data is exchanged digitally, it is important to understand who it is exchanged with: Is the data exchange only between humans, between a machine and humans – or only between machines?

Naturally, the sharing of data between humans or between humans and machines has different requirements from the sharing of data between machines. Humans can often understand data irrespective of its presentation. Sometimes a bit of training is helpful, but, generally, humans can understand data in tables, graphics, full texts and other formats fairly easily. Notably, they can often do this even though the context of the data may be unknown.

Machines have far more limited capabilities: Without specific guidance, they cannot connect data they receive with information that is not part of that data. Without prior instructions, they will also struggle to understand its context or how to read data. In principle, thus, machines need to be trained or instructed for each specific data file they are processing.

Hence, when it comes to the sharing of data between machines, both sides – client and server – need to essentially agree on a technical contract that regulates the specifications of the data as well as the conditions of how it is shared. Both sides need to know in advance how the data is structured and which semantical information is behind the data.

APIs can provide these functions and enable a seamless machine-to-machine data exchange. That is why they are increasingly adopted by all sorts of users who want to share data. Unfortunately, though, APIs do not come out of the box. To function well, they need to be written with due care by knowledgeable developers. This report aims to facilitate the proliferation of good quality APIs. In the following sections, it provides an introduction to web-based APIs and covers their underlying concepts.

eLearning module for Chapter 1

 

2. Underlying concepts

To understand modern API technologies, it is crucial to understand some foundational concepts first. APIs are an element of any modern distributed system. In short, they are interfaces to standardise and simplify the interaction between the different components, such as different servers, within an overall system. APIs allow the components of a distributed system to request computations from each other via interfaces in a client-server-relationship. For each call, the requesting service is the client and the service providing a program is the server. Going beyond this very basic principle, it is important to also understand the concepts of HTTP, URL, data representation formats, and predefined communication methods.

2.1 HTTP

The “Hypertext Transfer Protocol” (HTTP) and web links via “Uniform Resource Locators”1 (URLs) are probably the most basic concepts underpinning the “Web”. HTTP is not just essential for browsing the Web, but for any messaging or transferring of data between Web-connected systems. As such, the HTTP is also the basic transport layer for modern APIs. It is a text-based protocol that can easily be read and understood by humans. Furthermore, it facilitates not just the control of a communication session, but allows a client and a server, in a request / response cycle, to exchange messages that contain both meta information in their headers as well as the actual payload in their body. This payload can be data or executable code – or a combination of both.

HTTP communication can be secured by encapsulating a communication session with a “Transport Layer Security” (TLS) mechanism. Today, TLS is usually deployed in combination with HTTP, thus forming HTTPS, a secure version of HTTP. HTTPS is crucial to ensure that neither external nor internal users can intercept a communication.

2.2 URL

Put simply, a URL is a universal form of an address on the Web, most commonly known for websites. In the context of modern APIs, though, a URL can also address entry points and functions of a server or address a (multimedia) resource. WS-SOAP and GraphQL are examples for APIs that use URLs only to address an entry point. RESTful interfaces, instead, use URLs that contain basic parameters to trigger functions defined within an API, e.g. retrieving specific data from a database.

2.3 Data representation format

In addition to the communication protocol, i.e. how a message is transferred between two or more entry points, it is also important to understand how the actual content of a message, the data payload, is represented. For a relatively simple solution, users can deploy HTML forms for structured data, plain text (Base64) or encoded data (Binary Large Object, Blob). In practice, however, additional data representation formats are used, particularly for deeply structured data:

  • XML (Extensible Markup Language)2 is a strongly typed language to represent structured data using tags and attributes, e.g.:

<message lang=”de”>Hallo</message>

  • JSON (JavaScript Object Notation)3 is a lightweight object notation using nested braces and properties, e.g.:

{ message: { text: “Hallo”, lang: “de” } }

  • YAML (YAML Ain’t Markup Language)4 is a lightweight object notation using indented lines and properties, e.g.:

message:

text: Hallo

lang: de

XML is principally used by the WS-SOAP approach; the more lightweight JSON and YAML are instead used by WS-REST and GraphQL.

2.4 Predefined methods

The Web is designed to treat different resources, such as data items, multimedia objects, business objects, or plain text, in a uniform manner. To maintain this principle, the HTTP protocol has four predefined, agnostic methods that can be used for various objects and resources; preferably, these should be used to define an API:

eLearning module for Chapter 2

 

3. APIs and their architectural evolution

Since software systems are growing increasingly complex, their architecture is constantly changing, too. Structuring software into manageable pieces is thus increasingly important. This section explores the interplay between the architectural evolution of software systems and APIs.

The basic programming units to structure software are modules and libraries. When software systems changed from singular to distributed systems, loosely coupled components joined this list. The communication between the distributed components was ensured via “remote procedure calls” (RPC).5 Essentially, an RPC is a protocol that allows programme “a” in one computer to send an executable message to another programme “b”, that must be located in the same network.  By this, programme “a” (the client) can ask programme “b” (the server) to provide a certain service (or computation).

Initially, RPCs were mostly restricted to one homogeneous programming environment and a specific framework, e.g. “Java RMI”. But as the software systems became bigger and bigger, the distributed components evolved into independent services and the “Service-oriented Architecture” (SOA) paradigm became popular. In the SOA paradigm, each service has a clear functional focus and responsibility. This focus became even more poignant with the second generation of SOA, often referred to as “Domain Driven Design”6 (DDD) and “microservices”7. Here, the border of a service is defined primarily by its core domain, well-defined interfaces (API) and the organizational responsibility through a team that is fully accountable for a service.

Crucially, the evolution from software systems that are based on simple distributed components to software systems that are based on full-blown services greatly elevated the importance of APIs as connectors. Through this evolution, it also became clear that different types of APIs were needed for different technical tasks.

According to “Domain Driven Design” one or more microservices form a domain as a “service”. These services have clear boundaries and can be bundled with the data they manage. A service usually provides one or more APIs to other services to interoperate with these. It is also common that a service is bundled with a micro frontend to provide a GUI to the users. Together, a bundle of logic, data and frontend is referred to as a “Self-contained System”8(SCS). From an organisational point of view, only one dedicated team should always be responsible for one SCS.

Figure 1 illustrates a SCS architecture. It shows self-contained systems as stacks of cubes. Each cube consists of the typical architectural layers: data, business logic (view), and front-end. The blocks are self-contained because they do not rely on external resources in order to function. An integration of all systems together can be seen as an application, e.g. a web-shop.

Self contained systems

Figure 1: Self-Contained Systems

Figure 1 illustrates also that services and SCSs can communicate with each other in different ways. For example, a weblink in the GUI can be used to enable users to move to a new business transaction in the user interface, e.g. showing data, creating a new item and much more. The classic approach is that services communicate with each other in a synchronous manner using RESTful interfaces.9 However, there can also be cases where an asynchronous communication mode is preferable. Asynchronous communication implies that, the sender of a query does not wait until the recipient has answered a call, e.g. when one system asks another system to write certain data into its database. Instead of scheduling no further processes, and thus blocking CPU, until a response is received (e.g. that the data has been written into the receiving system’s database), the querying system in asynchronous communication immediately continues to process other calls. Overall, this approach helps to save and better manage CPU capacity. But it also has trade-offs, e.g. slower response times once a query response is received. Typically, this makes synchronous communication attractive to users who require e.g. reliable reading of data from another database. This could be the case for real-time data such as in the case of continuous weather updates.

Irrespective of their technical performance, it is crucial that API interfaces are well-defined to remain maintainable and reusable parts of a complex ecosystem. This means an API must:

  • have a comprehensive formal specification as well as a comprehensive documentation that is complete, understandable, consistent, and correct.
  • have a complete set of written binding requirements that are unambiguous and do not leave the user in doubt about their meaning.

When computers and software systems began to emerge only a few programming languages and runtime environments used to exist. Nowadays, that environment has grown far more complex. One of the main drivers for this development is the dogma that the best tool should be used to solve a problem. This drove a proliferation of tools and programming languages – and meant that the programming world had to find ways to handle multiple languages at once, i.e. it had to become polyglot.

What does this imply for the definition and implementation of an API? It is worth remembering that an API is effectively an interface designed to support the transfer of data between multiple systems or services, irrespective of the programming languages that these systems use. Therefore, interoperability, i.e. the ability to work together with a variety of systems and programming languages, should always rank highly. In practice, this means that well-designed APIs should ensure that systems involved in the transfer of data can use different programming languages. This can be best achieved by programming APIs with a neutral schema definition language, e.g. the Web Service Description Language10 (WSDL) for WS-SOAP-based APIs.

But while this approach is undoubtedly preferable and dogmatically superior, it is also fraud with practical difficulties. In reality, most APIs are not defined according to the “API first” approach that designs APIs in agnostic terms. Unfortunately, specifications and documentation are frequently derived from the source code of the framework used by the service or system to which the API is attached; e.g. a Java class definition. At first, this may be a quick and convenient solution to define suitable APIs for the originating system. But eventually, it means that these APIs are programming language and framework specific – and often poorly documented.

As a result of this unfortunate status quo, there is a strong demand to revert to the “API first”11 approach – assuming that this will help developers to use APIs more easily and, thus, successfully. Following the “API first” approach means that the API should be created and documented first, using a rich schema definition language and specifications, such as OpenAPI,12 to generate the required code (in the preferred language) and to write comprehensive documentation. The assumption is that following this principle will eventually help to create more interoperable and, thus, better APIs.

eLearning module for Chapter 3

 

So, stay tuned and don’t forget to discuss the materials by commenting on each instalment, or with your peers in our forum.  Also, test your knowledge in the SCDS API Guidance eLearning modules: https://elearningcourses.eudatasharing.eu/en/apiguidance/#/

SCDS: API Guidance - An overview of API technologies part 1
Image credit:
Support Centre for Data Sharing