The challenges of multi-cloud environments

8 minute read

When this all IT revolution began, we started with one computer that was the size of a room, then we invented server rooms, we started dividing servers into virtual machines, but apparently it wasn’t good enough. Then the cloud revolution came and it has been a game changer since then. With cloud computing we got self-service through API calls that enable us to create various resources in different parts of the world. What an excellent and convenient solution! Why would anyone want more?
It turns out that there are a couple of reasons why one would want to move to another level - multi-cloud.

Reasons for going multi-cloud

There are many reasons and I chose the most important ones that have the biggest impact during the decision making process.

Costs

Although most popular cloud services are comparable when it comes to costs, there could be slight differences between cloud providers, especially when considering geographical placement. And besides, on a larger scale a few percent cheaper virtual machines could save thousands of dollars in total, which is something worth considering.
When using multiple cloud providers you can also leverage the fact that you are able to move your workload to a competitor and negotiate better terms of your contract.

Vendor lock-in avoidance

Many companies tend to avoid a vendor lock-in situation and even if they have their preferred cloud provider, it is a matter of the multi-cloud policy that assures the portability of your services to the other cloud provider. This also has a positive side effect - by using multiple cloud providers you can compare which of them are better in terms of not only cost, but also when it comes to reliability (SLAs), stability, speed, and other factors your organization finds important.

Broad SaaS offering

When providing a product in a SaaS model, it is mandatory to have an offering that is available on all major cloud platforms. It is crucial, especially if you offer it by providing direct access from your customers’ environments by linking their networks with a dedicated environment where you put your software. This is a fairly popular model used by companies providing services which are sensitive to latencies such as databases.

Higher availability

No cloud provider can give you 100% availability and how many “nines” they can offer in their SLAs depends on the service type and sometimes its tier. So to increase availability of your service you may need to leverage not only multiple regions, but also multiple cloud providers. Sometimes you may need to use a particular geographic region and an outage of a crucial service can affect your services which can be avoided by spreading them out into different cloud providers since they place their regions in similar locations.

Leverage unique services

Not every cloud provider offers the same set of services and this is the reason why you may want to use this particular service even if you’re running most of your applications on a single cloud provider. This will require not only setting up the secondary provider, learn how to manage it, but will also open up new opportunities to test and compare other services since you’ve come such a long way.

Kubernetes as a multi-cloud service

Now I will focus on a small, but very crucial part of cloud services - Kubernetes services with all the services and challenges behind it. It is important especially for organizations which run their software or the software they are provided by external vendors. Nowadays it’s not only software that is eating the world, but also containers have started to taking a huge bite of it.

When one is enough

There are cases where multi-cloud can bring a lot of benefits, but on the other hand there are dozens where it just makes no or little sense. It is especially valid when using Kubernetes which is designed for distributed systems and can leverage underlying cloud infrastructure to provide the following features:

  • easy scalability of applications running as containers
  • high availability and resilience thanks to the scheduler and its features (i.e. pod affinity and anti-affinity rules, node affinity rules and automatic distribution among different availability zones)
  • geographical distribution using multiple clusters and service mesh connecting them together (additionally there will be a centralized control plane for those distributed clusters - kubefed)

If only that solved all of the challenges that are not that easily addressed by Kubernetes itself, but might be solved with other ways.

Challenge 1 - storage

This is probably the biggest challenge of all - how to provide a consistent storage used by these distributed systems which can work not only in various regions but also in different cloud providers?
Here’s the simplest solution - just don’t. If you can skip this part and design your systems in a way that would not require storage synchronization then you’ll save yourself a lot of time and potential problems that will eventually come up.

In case you really need to have everything in-sync between all sites and regions think it through once again. Then, if you really, I mean really, really want it, then consider choosing a storage that would implement synchronization between multiple sites. One of the most interesting projects for distributed databases is Vitess based on a solution used by YouTube and is compatible with standard SQL databases. It is based on MySQL so if that’s fine for you then you can start experimenting with it to create a multi-region and multi-cloud solution that will span across multiple sites (e.g. multiple Kubernetes clusters). For Kubernetes there is even an operator that makes it quite easy to set up. Cassandra is an alternative to the Vitess which is more mature, but also not MySQL-compatible and requires you to design your app specifically for that type of database. There are a couple of operators as well - this one and this provided by Datastax.
It is also worth to mention projects such as Rook or OpenEBS which can provide a low-level solution on which it is possible to build something more universal.

Challenge 2 - networking

Placing your applications in multiple regions and cloud providers could in fact be quite easy with Kubernetes and the next step is to put some traffic to these environments. With a single site it’s a trivial thing - all you need is some dns records pointing to your load balancers and the rest is handled by Kubernetes.
For multi-cloud there are some caveats. First, you should avoid cloud-specific configurations such as Ingress that leverage features of particular cloud providers. This is the point where they provide a really long and compelling list of features configurable in their specific way that makes it a perfect trap for vendor lock-in.
Second, and in my opinion the most challenging part, is connectivity between your clusters. It’s a very important decision you need to make on whether to treat your clusters independently or as a whole. If you decide to take the former your life would be much easier, and if you prefer the latter you will need to invest much time, but at the same time you’ll get a huge, distributed cluster. For independent, not interconnected clusters you need to provide a method for routing external traffic to your clusters. To make things short and easy - choose AWS Route53, as this is the best method of load balancing traffic to the same application deployed on multiple clusters running in different clouds on different regions. It provides a healtcheck method that will cut off misbehaving clusters automatically. Currently I just don’t see a better way to handle this.
When creating a swarm-like configuration where applications communicate with each other you need a way to connect all clusters and fortunately there are a few methods you can use. I would consider either HashiCorp Consul or Istio. They are both service meshes that use Envoy that is configured automatically to proxy traffic between all applications and provision special proxies at the edge of each cluster that interconnect them in a transparent way. If you also need to manage those clusters from a single place you may use kubefed project - they should release the beta version soon.

Challenge 3 - differences in Kubernetes services

Although Kubernetes is all about abstraction layers and portability, cloud providers are aware of its growing popularity and they want not only to attract more users to their cloud services, but they also want them to stay there for long periods of time. You can see many ways they use to discourage you from migrating or using other cloud providers. That’s why there are few differences between those Kubernetes services. This results from the fact that Kubernetes service gives control to cloud provider over all Kubernetes parameters and there are plenty of them alongside with feature gates that modify the behaviour of the clusters that one can. One of the most evident examples is AWS EKS authentication. It’s the Kubernetes cloud service where you cannot use certificates to authenticate and most likely you will end up with some IAM accounts or service accounts’ tokens. Many features greatly enhances performance such as Network Endpoint Group available on Google GKE and it’s just hard to switch to other, often worse solution. The same applies to availability of Istio, different logging and monitoring approach. These are hard choices you have to make if you really want to leverage the portability feature and the final solution to that challenge is to provision your cluster manually using IaaS services. This approach applies to OpenShift/OKD or any other installer that provisions control plane with configurable masters.

Conclusion

Multi-cloud approach is hard. Even with Kubernetes it can be sometimes easier to accept the fact that keeping this either possibility of using multiple cloud providers or actually implementing it requires a lot more effort than sticking with one. Those who will choose the harder path will leverage almost utopian vision of interconnected swarm of containers running on different IaaS implementations, different regions and maybe even completely different ways they are operated underneath those all abstraction layers that Kubernetes create.
Even if multi-cloud concept is something abstract and for some even unnecessary, then hybrid solutions are often a must for many organizations and rules that will allow them to build such solutions are the same. I expect this to be a hot topic for upcoming months and years and I can’t wait to see and help to implement more of those setups. After all with Kubernetes it has never been easier.

Leave a comment