Load balancing or balancing on the edge of a cliff?
As a backend developer/architect I was always interested in this topic. When you need to scale big — load balancing is the first thing you want to start thinking about.
It doesn’t matter whether you are building a backend for one of your Apps or your product is a SaaS/PaaS used by 3rd parties. Your backend should be solid like a rock, otherwise you will lose your user base fast. The loyalty of your users depends on the backend. If you build a SaaS then your platform API is going to be a foundation for many other applications. Their success will depend on your platform. So, your backend should be reliable, stable and scaleable to any amount of traffic.
Lets imagine you’ve launched something in production:
- Your traffic will be growing. This is the indicator that you are on the right way to building something cool; people want to use your App. You want that too.
- If you are on something big then your traffic will grow like crazy. People will want to use your app/service so bad! Make sure you can scale out your backend.
- You won’t be able to stop that traffic; this is something you will have to live with.
- Things will go bad on backend side soon (sooner than you think!).
- Things will go bad on backend side especially when your traffic spikes. Think about article on some popular blog/web site. Another example would be a guy with a million followers who tweeted about your service.
- You will have to change backend, deploy fixes and features under heavy traffic. Think about heart surgery — this is exactly what you will be doing all the time.
Sounds scary enough, right? Well, if you still want to build your killer app and make this world a better place — keep on reading…
Load balancing to the rescue!
The good news is that load balancing will help you in most cases (if you use it properly). Lets take a look what is load balancing.
So, there is a load balancer box:
- it has a pool of backend servers behind it
- it distributes traffic to them in a reverse proxy manner
- it distributes equal amount of traffic to all backend servers in a pool
Load balancer faces the real traffic first; here your request’s life cycle starts. It is important to have a “super high-availability” setup for load balancer. If load balancer dies — everything dies. Keep that in mind and make sure it will survive anything except an asteroid. There are many techniques how to setup a load balancer (this is out of scope of our discussion here). Take a look at AWS ELB — it does a great job. Also, consider DNS health checks and multi geo-region setup on AWS. This actually might survive even an asteroid disaster!
This layout allows you to solve many problems:
- horizontal up-scaling, traffic is growing — just add more backend servers to the pool.
- horizontal down-scaling, traffic is dropping — remove an appropriate number of backend servers from the pool.
- deploy fixes/features without down time — now you can actually stop live traffic! You can set load balancer to stop sending traffic to one of the backend servers and then upgrade it.
- health checks! Load balancer sends health check requests to all backend servers. If one of them doesn’t respond or responds too slow, the load balancer removes this backend from the pool. Then load balancer distributes “released” traffic to all other backend servers in the pool.
Looks like all our problems are solved:
- we can scale out our backend for growing traffic
- we can react to traffic spikes and drops. We can even automate it (AWS CloudWatch and autoscaling to the rescue!)
- we can update backend under heavy loading without down time. If one of our backend servers dies it can be replaced, again without down time
- we can do experiments by deploying some change to only one server and sending a small percentage of traffic to it
Of course, there are still many things which could fail. Our backend servers will likely do queries to some DBs/storage. Also, they will be sending requests to other internal/external services. So, you will need to make sure that all those parts are scaleable and reliable as well. For the sake of brevity lets say all our internal services are behind load balancers too. You can do this on AWS using VPC internal ELBs.
You might ask why this post has “balancing on the edge of a cliff” in its title? Everything seems to be good, right?
I’d like to speak more about health checks as this is the place where things might go bad. Lets recap what happens when some of your backend servers fail:
- load balancer sends health check to some backend server
- backend is not responding to health check for some reason
- so, this backend server gets marked as “unhealthy”
- load balancer removes this unhealthy server from the pool and stops sending traffic to it
- load balancer re-distributes traffic to all servers left in the pool
- so, the amount of traffic which was going to removed the backend server now goes to other servers
- if you have autoscaling setup it will add new backend servers to the pool
- load balancer will start sending traffic to this new backend server once it is up and healthy
Ok, we need to answer to several questions:
- are those servers ready to serve this chunk of “extra” traffic? what if they are not? seems like we need to be careful with our backend pool capacity, right?
- what if we get several unhealthy backend servers at once? this “extra traffic” will be bigger and the amount of servers left in the pool will be smaller, right?
- can this “extra traffic” cause other servers to become unresponsive and marked as “unhealthy”?
- what if our autoscaling launches new servers slow? looks like our “extra traffic” will grow like a rolling snow ball, right?
Load balancing flow looked so smooth and reliable before… Now, we have to think about lots of things.
If you don’t plan the computational capacity of your pool — you will get in trouble. The worst scenario is when your pool behind the load balancer appears empty. Load balancer will mark all existing backend servers as “unhealthy”. The new backends will face the storm of existing traffic and will likely become unhealthy as well. Your pool of backend servers will be catching up, but never makes it. Don’t forget, your traffic is not going to stop, it will be the same or even growing.
So yeah — now it looks more like balancing on the edge of a cliff!
The good news — there is a list of simple things which can help to prevent this “worse collapse scenario”:
- Always keep your backend pool a little over provisioned. You should be ready to handle at least 25% more traffic at any time, without any hiccups.
- Scale up fast when it is time! Make sure bootstraping of new backend instance is blazing fast! Or as fast as your cloud provider can do that. It should not depend on any 3rd parties, repos and compilations at boot time.
- Look at your existing traffic, find patterns and trends — this is valuable information. Say, every weekday you see traffic starts bumping 30% after 5pm. What is it? Kids come back from school — right? Can we scale up 10 minutes before this happens? Yes, we can.
- Whatever happens, scale down sloooowly! If you don’t have traffic so much any more that doesn’t mean you won’t have it in the next couple of seconds. Also, traffic drops are not always real drops. What if one of your platform API users is having issues at the moment so they stopped sending requests to you. You need to be careful with down scaling.
- Make you backend servers as granular as possible. It is not good idea to have 4 instances serving all traffic. If one fails you will get 25% chunk of “extra traffic” redistributed to 3 instances left in the pool.
- Think about your backend servers as disposable resources. You should be able to stop them or launch new ones at any time. Don’t use sticky sessions. This can lead to user’s data lost or unbalanced loading in your backend pool.
- Again, bigger number of backend servers is better. They can be tiny ones as each of them will be serving a small fraction of the traffic. If some of them fail others will still work.
Lets have a dream.
Last thing I wanted to mention — the thing I actually don’t like in load balancers. Load balancers don’t respect backend servers. They just send traffic to them and don’t care if backend dies because of this traffic.
I wish one day I had a load balancer which would take in account many backend metrics before sending traffic to it:
- desired throughput in terms requests per second, declared by backend
- current CPU utilization
- current RAM usage
- current network I/O throughput
- current disk I/O throughput
Say I am smart load balancer: “I know everything about my backend servers. So, if I know that this backend is already 90% busy serving other requests why do I need to send to it more? I don’t want to kill it, I won’t do that. If all my backend servers are busy — sorry. I’ll be processing existing traffic, all other extra requests will be just timed out. Once I have more backends less busy and ready to serve I will make them busy. But again, respecting metrics from those guys”.
If you know about any solution like this — let me know, I will be more than happy to test it!
And the very last thing — lets have a dream.
Imagine the world where load balancer doesn’t send any requests to backend servers. Moreover, our load balancer is not a load balancer anymore, in a sense we know it. It is something like a storage for requests where they wait for a chance to reach backend. The backends would become just workers. They will be pulling requests from this storage, serve them and send results back. Now our backend servers decide how much traffic they can serve by themselves. If you don’t have enough workers then some of your requests will expire and user will receive a time out error.
A good analogy would be a restaurant. “Customers” are our requests. “Restaurant” is our storage for requests. “Waiters” are our backend workers. Waiters can work as fast as they can. Waiters can serve as many customers as they can. Every waiter decides when he/she is ready to serve the next customer, right? The key point is that you will always have some number of your customers served and happy, guaranteed! Some of your waiters might fail but other waiters will have no idea about that. Other waiters will be busy serving their current customers. If you don’t have enough waiters some of you customers will be waiting for too long. They will leave the restaurant disappointed. Scale, hire more waiters!
Does this sound interesting? This could be a good topic for a next post about load balancing.
If you are interested in more cool stuff about scale-ability and backend development – please follow me on theinit.ai blog
Also, check theinit.ai web site if you interested in conversational apps development, NLP and AI.
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » Back-End Architecture: Load Balancing or Balancing on the Edge of a Cliff