Behrooz Farkiani (A paper written under the guidance of Prof. Raj Jain) | Download |
The scalability and flexibility of microservice architecture have led to major changes in cloud-native application architectures. However, the complexity of managing thousands of small services written in different languages and handling the exchange of data between them have caused significant management challenges. Service mesh is a promising solution that could mitigate these problems by introducing an overlay layer on top of the services. In this paper, we first study the architecture and components of service mesh architecture. Then, we review two important service mesh implementations and discuss how the service mesh could be helpful in other areas, including 5G.
Service Mesh, Cloud Native Application, Container, Kubernetes, Istio, Linkerd, 5G.
A service mesh is a dedicated overlay layer on top of (micro) services that handles service-to-service communication. The main goal of the service mesh is the reliable delivery of requests through the topology of services. Although there is no official standard architecture of the service mesh concept and its components, researchers defined and proposed its components in both control and data planes. There are also two important implementations: Istio and Linkerd. We discuss these implementations and how service mesh could benefit other computing areas.
The structure of this paper is as follows. Section 2 briefly reviews the general components of service mesh architecture. We also review how the service mesh could be utilized in edge computing and Fifth-generation cellular technology (5G). Then, we discuss two important service mesh implementations named Istio and Linkerd and compare them with other implementations in Section 3. Finally, Section 4 concludes this paper.
A cloud-native application might consist of several (micro)services that might be implemented in different programming languages, belong to different tenants, and have many service instances with a short lifetime to support traffic demands. It is the job of the service orchestrator component to manage this dynamic environment, manage and debug their interactions with each other and traffic flow, monitor their performance, and collect statistics related to the service [Li19][Redhat]. However, with a large number of services, efficiently performing these tasks becomes challenging.
Service mesh was introduced to mitigate the difficulty of performing the aforementioned tasks. In general, a service mesh implementation should provide the following features [Khatri20] [Li19]:
In the rest of this section, we first review the background concepts. Then, we introduce a general service mesh architecture and continue to discuss the performance impact of implementing a service mesh. Finally, we discuss its business importance and the applications of service mesh in 5G and edge computing.
This section explains the important concepts that will be used in the rest of this paper. Here, we review the evolution of software architecture, cloud-native applications, and service mesh definition.
In recent years, we have witnessed a shift from monolithic applications architecture to service-oriented and microservice architecture. In monolithic architecture, all components of the application are tightly coupled together. As another approach, we could define and design services and break down the entire application into a set of services, each providing a business function; one of the main important features of service-oriented architecture is the loose coupling between service consumers and providers. These services could be developed, deployed, scaled, and administrated independently, and they have little or no knowledge of each other or any integration. Figure 1 and Figure 2 represent examples of monolithic and service-oriented architectures, respectively.
Microservice architecture is still a service-oriented architecture; it is made of reusable, loosely coupled (relatively small) components that work independently of each other. However, the main difference between these two architectures resides in their scopes: in service-oriented architecture, we focus on an enterprise scope, while in a microservice architecture, the focus is on the application level [IBM]. Figure 3 explains this difference.
Other differences stem from this fact. Service-oriented architecture has a higher level of reuse and a lower level of data synchronization. Indeed, service-oriented architecture needs reuse and component sharing to achieve its scalability and efficiency goals. In addition, data is usually accessed and modified at its main source, which reduces the need for synchronization. On the other hand, reuse in microservice architecture leads to some level of dependency, which reduces agility and resilience. Therefore, we witness duplication of services. More importantly, each microservice has its own local copy of the data it needs. Two main advantages of microservice architecture are rapid development and a higher level of scalability in comparison to service-oriented architecture. An example of microservice architecture is shown in Figure 4.
Another important concept is cloud-native applications, which has recently been used to describe container-based environments. Cloud-native applications usually refer to applications in which software development is a relatively rapid process because of the automated scalability and deployment process. An example of cloud-native applications is shown in Figure 5 [Khatri20].
Cloud-native applications and microservice architecture usually benefit from containerized environments and run their services in containers. A container is a unit of software that packages up the code and all its dependencies. It enables us to move the application from one computing environment to another one and to run the application quickly and reliably [Docker]. We could create a container image that includes the program and its dependencies. A container runtime, like Docker, provides an environment to execute dockers on the host operating system. Containers are lighter than virtual machines and provide less isolation in comparison to them. In addition, in large production environments, we need a container orchestration platform to manage the life cycle of containers. Kubernetes [Kubernetes] is a very popular orchestration system that is widely used in the current implementation of service mesh architecture as the container orchestration platform.
Envoy is a layer-7 proxy and communication bus designed for large modern service-oriented architectures. Envoy can shape, shift, split, route traffic, and collect telemetry for all service calls. Envoy proxy is transparent to applications and provides the following features [Khatri20] [Envoy] :
Service mesh is an application infrastructure layer on top of the microservice architecture that manages microservice-to-microservice communication. As a layer, it has both control and data planes. Service mesh is a concept, and there is no standard definition for it that specifies all its requirements and components. In this section, we present its components that are generally accepted and used in the research and software communities. Section 3 provides a detailed description of popular implementations.
Figure 6 illustrates different components of the service mesh. The following general components are suggested in each of the data and control planes.
In service mesh, traffic passes through additional sidecar proxies. This will result in additional end-to-end delay and reduces the performance. Authors of [Zhu22] designed a decomposition approach and a tool named MeshInsight to measure service mesh overhead. They showed implementing service mesh could result in 185% higher latency and 92% more virtual cores.
Figure 7 shows the data path for both inbound and outbound traffic. We could see there are three separate connections: two between sidecar proxies and their microservices and one between sidecar proxies. As a source of overhead, the message buffer should also be copied into the proxy buffer and vice versa. In addition, there are additional system calls, and the sidecar may process the message layers to do some actions. The authors also mentioned that protocol parsing is a major source of overhead for HTTP and gRPC proxies.
The authors of [Ganguli21] measured the performance impact of deploying service mesh in edge environments. They deployed Kubernetes in a virtual machine environment instead of a containerized environment. The authors showed that using Istio could reduce HTTP throughput between virtual machines by up to 70% and could double the tail latency.
Gartner [Gartner] categorized service mesh as an adolescent technology with 1% to 5% market penetration. It also categorized service mesh in the "Trough of Disillusionment" phase of the hype cycle. This means that the producers of the service mesh need to shake out or fail; they need to improve their products to satisfy early adopters.
This section reviews the application of service mesh in other areas. For example, [XIE20] proposed using Kubernetes and Istio for an on-demand image classification application to support load balancing and scheduling. Here, we discuss how we could employ service mesh to improve 5G network efficiency and how it could be used in resource-limited edge computing environments.
The authors of [Dab20] investigated the problem of steering traffic between microservice-based network functions in 5G architecture. One way to achieve the requirements of 5G networks while reducing the total operational costs is to use cloud-native applications. However, steering traffic between network functions is challenging, and service mesh could help us to tackle this issue. The authors proposed a cloud-native service function chaining framework based on Kubernetes and Network Service Mesh [NSM]. Then, the authors formulated the network-aware load-balancing optimization problem and proposed an algorithm to solve it.
[Wojciechowski21] proposed a scheduler for 5G networks by extending the Kubernetes scheduler and utilizing information gathered by the Istio service mesh. The authors aimed to improve service placement to reduce the latency. Their scheduler uses two metrics that are gathered by the service mesh: the number of bytes that are transferred in requests and responses. The scheduler uses these metrics to calculate the average flow between applications. Then, it could detect the nodes that have the highest flow and collocate them.
[Furusawa22] proposed a service mesh controller that balances the load between edge servers. Usually, the service mesh is used in cloud environments. However, we could benefit from deploying them in edge environments because of the limited computing resources of edge servers. Consider a set of edge servers hosting applications that serve cars. In the case of car accidents and traffic congestion, the requests to the servers that are located in the related geographical area increase, and these servers become overloaded. In such cases, cooperative load balancing could be beneficial to avoid edge server overloading. In cooperative load balancing, some requests are redirected to other nearby servers. However, the current Kubernetes container execution implementation lacks the feature of using geographical data that is essential to implement cooperative load balancing. Therefore, the authors utilized Istio and proposed a weight calculation algorithm that is used to forward traffic to other nearby edge servers. As a systematic review of the challenges of implementing service mesh in edge environments, readers should consult [Duque22].
We reviewed the shift in software architecture from the monolithic architecture to microservices. Then, we discussed the main features of the service mesh, its performance impact, and its business importance. While employing service mesh could ease the management of microservices, it also could lead to significant performance degradation. As it is noted by [Sedghpour22], employing eBPF for root cause analysis, high-performance monitoring, and management could significantly improve the performance of service mesh implementations. We also reviewed the applications of service mesh in 5G and edge computing.
Two important and widely used implementations of service mesh are Istio and Linkerd. In this section, we discuss these implementations and their components. Then, we provide a table to compare different service mesh implementations.
Istio service mesh, started in May 2017, is one of the fastest-growing open-source service mesh projects. Istio extends Kubernetes and utilizes Envoy proxies to provide traffic management, telemetry, and security [Istio].
Istio has a centralized control plane and supports integration with virtual machines and service discovery through other third-party service catalogs. Istio uses Envoy as its sidecar proxy and extends the Kubernetes API server for configuration management and access control. It also uses Kubernetes' built-in datastore, called etcd, to store its state and configuration. A high-level view of Istio architecture is shown in Figure 8 [Istio] [Khatri20].
The Istio control plane has four main components:
The primary functions of the Linkerd control plane are telemetry data aggregation, service API calls, and enabling data access between the control plane and service proxies. The control plane has the following components:
We use the table presented in [serviceComp] to compare the features of Istio [Istio], Linkerd [Linkerd], Consul [Consul], Kuma [Kuma], and Open Service Mesh [OSM]. Table 1 describes these differences.
Feature \ Implementation | Istio | Linkerd | Consul | Kuma | Open Service Mesh |
---|---|---|---|---|---|
License | Apache License 2.0 | Apache License 2.0 | Mozilla License | Apache License 2.0 | Apache License 2.0 |
Service Proxy | Envoy, proxyless for gRPC (experimental) | Linkerd2-proxy | defaults to Envoy, exchangeable | Envoy | Envoy |
TCP | Yes | Yes | Yes | Yes | Yes |
HTTP/1.1+ | Yes | Yes | Yes | Yes | Yes |
HTTP/2 | Yes | Yes | Yes | Yes | Yes |
gRPC | Yes | Yes | Yes | Yes | Yes |
Automatic Sidecar Injection | Yes | Yes | Yes | Yes | Yes |
Platform | Kubernetes | Kubernetes | Kubernetes, Nomad, VMs, ECS, Lambda | Kubernetes, VMs, ECS | Kubernetes |
Extension of the Mesh by containers/VMs outside the cluster | Yes | No | Yes | Yes | No |
Control and observe multiple clusters | Yes | Yes | Yes | Yes | planned |
Traffic Access Control | Yes | No | Yes | No | Yes |
Traffic Split | Yes | Yes | No | No | Yes |
Traffic Metrics | Yes | Yes | No | No | Yes |
Service Log Collection | No | No | No | No | Yes, using Fluent Bit |
Access Log Generation | Yes | No (tap feature instead) | Yes | Yes | No |
Per-Route Metrics | experimental | Yes | depending on the proxy used | No | No |
Load Balancing | Yes (Round Robin, Random, Weighted, Least Request) | Yes (exponentially weighted moving average) | Yes (Round Robin, Random, Weighted, Least Request, Ring Hash, Maglev) | Yes (Round Robin, Least Request, Ring Hash, Random, Maglev) | Yes |
Percentage-based Traffic Splits | Yes | Yes | Yes | Yes | Yes |
Header- and Path-based Traffic Splits | Yes | planned | Yes | Yes | Header-based |
Circuit Breaking | Yes | No, planned for 2.12.0 | Yes | Yes | Yes |
mTLS | Yes | Yes, on by default | Yes | Yes | Yes |
mTLS Enforcement | Yes | Yes | Yes | Yes | Yes, via https://linkerd.io/2.11/features/server-policy/ |
mTLS Permissive Mode | Yes | Yes | No | Yes | Yes |
mTLS by default | Yes, permissive mode | Yes, permissive mode | Yes | No | Yes |
Service-to-Service Authorization Rules | Yes | Yes | Yes | Yes | Yes |
In this section, we reviewed the popular service mesh implementations and their components. Then, we provided a table that compares the different features of the five most important implementations. It is clear that all of the reviewed service meshes rely on Kubernetes as the orchestration framework. Therefore, any future implementation should also consider Kubernetes as one of the candidates for the orchestration system.
In this paper, we reviewed the service mesh concept, its features, and its popular implementation. We showed that implementing service mesh could ease service management and policy enforcement and improve service observability. It also enables us to extend orchestration framework capabilities without directly modifying its core code. However, it also results in performance degradation that needs to be addressed. We also discussed the benefits of using service mesh in 5G and edge environments.
It seems that current implementations of service mesh have reached a maturity level in terms of features. Therefore, future research should focus on how we could improve the performance of the implementations without limiting its functionality.
5G |
Fifth-generation technology standard for broadband cellular networks |
---|---|
HTTP |
Hypertext Transfer Protocol |
TLS |
Transport Layer Security |
mTLS |
Mutual TLS |
gRPC |
Google Remote Procedure Call |
API |
Application Programming Interface |
[Cha21] D. Cha and Y. Kim, "Service Mesh Based Distributed Tracing System," in International Conference on Information and Communication Technology Convergence (ICTC), Oct. 2021, pp. 1464-1466, https://ieeexplore.ieee.org/document/9620968
[Consul] "Multi-Platform Service Mesh with Consul." https://www.consul.io/use-cases/multi-platform-service-mesh [Consul service mesh].
[Dab20] B. Dab, I. Fajjari, M. Rohon, C. Auboin, and A. Diquelou, "Cloud-native Service Function Chaining for 5G based on Network Service Mesh," in IEEE International Conference on Communications (ICC), Jun. 2020, pp. 1-7, https://ieeexplore.ieee.org/document/9149045
[Docker] "Docker: Accelerated, Containerized Application Development," https://www.docker.com/ [Docker project website].
[Duque22] A. O. Duque, C. Klein, J. Feng, X. Cai, B. Skubic, and E. Elmroth, "A Qualitative Evaluation of Service Mesh-based Traffic Management for Mobile Edge Cloud," in 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 2022, pp. 210-219, https://doi.ieeecomputersociety.org/10.1109/CCGrid54584.2022.00030
[Envoy] "What is Envoy" https://www.envoyproxy.io/docs/envoy/latest/intro/what_is_envoy[Description of Envoy's features].
[Furusawa22] T. Furusawa, H. Abe, K. Okada, and A. Nakao, "Service Mesh Controller for Cooperative Load Balancing among Neighboring Edge Servers," in IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN), Jul. 2022, pp. 1-6, https://ieeexplore.ieee.org/document/9820181
[Ganguli21] M. Ganguli, S. Ranganath, S. Ravisundar, A. Layek, D. Ilangovan, and E. Verplanke, "Challenges and Opportunities in Performance Benchmarking of Service Mesh for the Edge," in IEEE International Conference on Edge Computing (EDGE), Sep. 2021, pp. 78-85, https://ieeexplore.ieee.org/document/9711981
[Gartner] "Gartner: Hype Cycle for Enterprise Networking, 2022," https://www.gartner.com/document/4016160 [Gartner Report on Enterprise Networking explains the importance of service mesh and its current maturity - requires login]
[IBM] "SOA vs. Microservices: What's the Difference?," https://www.ibm.com/cloud/blog/soa-vs-microservices [Explains the difference between service oriented architecture and microserviec architecture]
[Istio] "Istio," https://istio.io/latest/ [Istio service mesh].
[Khatri20] A. Khatri, V. Khatri, D. Nirmal, H. Pirahesh, and E. Herness, "Mastering Service Mesh: Enhance, secure, and observe cloud-native applications with Istio, Linkerd, and Consul," 1st edition. Packt Publishing, 2020, ISBN: 9781789615791
[Kubernetes] "Production-Grade Container Orchestration," https://kubernetes.io/ [Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications].
[Kuma] "Kuma," https://kuma.io/ [Kuma service mesh].
[Li19] W. Li, Y. Lemieux, J. Gao, Z. Zhao, and Y. Han, "Service Mesh: Challenges, State of the Art, and Future Research Opportunities," in IEEE International Conference on Service-Oriented System Engineering (SOSE), Apr. 2019, pp. 122-1225, https://ieeexplore.ieee.org/document/8705911
[Linkerd] "The world's lightest, fastest service mesh." https://linkerd.io/ [a popular implementation of service mesh that claims to be the fastest]
[NSM] "Network Service Mesh." https://networkservicemesh.io/ [Network Service Mesh is a Cloud Native Computing Foundation sandbox project that implements the service mesh concept].
[OSM] "Open Service Mesh." https://openservicemesh.io/ [Open Service Mesh].
[RedHat] "What's a service mesh?" https://www.redhat.com/en/topics/microservices/what-is-a-service-mesh [RedHat entry on service mesh explaining background concepts]
[Sedghpour22] M. R. S. Sedghpour and P. Townend, "Service Mesh and eBPF-Powered Microservices: A Survey and Future Directions," in IEEE International Conference on Service-Oriented System Engineering (SOSE), Aug. 2022, pp. 176-184, https://ieeexplore.ieee.org/document/9912629
[serviceComp] "servicemesh.es." https://servicemesh.es/ [This website compares different implementations of service mesh]
[Sharma19] R. Sharma and A. Singh, "Getting Started with Istio Service Mesh: Manage Microservices in Kubernetes," 1st ed. edition. Apress, 2019, ISBN:9781484254578
[Wojciechowski21] A . Wojciechowski, K. Opasiak, J. Latusek, M. Wereski, V. Morales, T. Kim, and M. Hong "NetMARKS: Network Metrics-AwaRe Kubernetes Scheduler Powered by Service Mesh," in IEEE INFOCOM 2021 - IEEE Conference on Computer Communications, May 2021, pp. 1-9, https://ieeexplore.ieee.org/document/9488670
[XIE20] X. XIE and S. S. Govardhan, "A Service Mesh-Based Load Balancing and Task Scheduling System for Deep Learning Applications," in 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), May 2020, pp. 843-849, https://ieeexplore.ieee.org/document/9139676
[Zhu22] X. Zhu, G.She, B. Xue, Y. Zhang, Y. Zhang, X. Zou, X. Duan, P. He, A. Krishnamurthy, M. Lentz, D. Zhou, and R. Mahajan, "Dissecting Service Mesh Overheads." arXiv, Jul. 02, 2022, https://arxiv.org/abs/2207.00592