Building Highly Scalable and Fault Tolerant Applications

Building Highly Scalable and Fault Tolerant Applications

Many of you are aware of, and completely understand the notion the concept that “time is money. Every minute an item is not able to be developed, promoted or sold, the bottom line suffers. This idea is more vital than ever before in IT. Every second a system fails are the loss of productivity and revenue. In fact, it’s believed at 98% businesses suffer a loss of $100,000 per minute of downtime. This is a significant amount of money.

With AWS the teams can create robust web and mobile applications that are extremely durable and whose downtime is almost never. They must make sure that their systems are the lowest level of availability or the highest level of fault tolerance as both are disaster recovery strategies.

It requires time and money to ensure that your systems are resilient enough. But, all of that is when compared to the costs incurred through the downtime. This is why it’s crucial to ensure that your system is extremely reliable or fault-tolerant. Let’s examine how they work and what they differ.

What is High Availability?

High availability implies that a system can almost always have uptime, though times when it is in a decline. In the case of AWS, it is excellent availability when it is able to provide 99.999 percent uptime. This is which is also known in the form of “five nines”. For a better understanding it means that the system will be offline for just five minutes and 15 seconds per year. It is feasible — and is even normal for AWS.

High availability is built into the system by the total elimination of the single point of failure employing system redundancy. For example, if your network had 5 computers that were connected to a single server, that could be one problem. If the server room floods to the point of destruction, and the server was destroyed then you’re out of luck.

To prevent this from happening to avoid this, a backup server could be activated in the case of an emergency. This adds redundancy in order to eliminate one point of failure. Therefore, high availability eliminates single failure points through redundancy.

Prior to the advent of AWS, the cost of this was high and difficult to maintain. This was usually accomplished through the configuration of complex RAIDs in order to ensure redundancy of databases. Additionally, the hardware had been placed within temperature-controlled bomb-shelter like structures that were costly to maintain.

The abundance of cloud-based services has made high-availability an affordable and feasible alternative for nearly every company. Let’s get a deeper dive into how we can use AWS to avoid single points of failure, and increase the redundancy of our system.

With millions of customers and a variety of services for entry-level users that aid businesses quickly get on board with cloud, it’s not surprising that businesses are shifting essential mobile applications and services onto the cloud computing platform called elastic compute (EC2). Based on AWS (Amazon Web Services), “Amazon Elastic Compute Cloud (Amazon EC2) is a cloud computing service that provides computing resources, which are basically server examples that can be used to develop or host software mobile applications. Amazon EC2 is an ideal entry point into AWS to develop your mobile applications. You can create a robust and fault-tolerant system by with multiple EC2 examples as well as ancillary services like Auto Scaling and elastic load balancing.”

Fault Tolerance and Redundancy: The Same, Just Different

The first step? What is the difference between fault-tolerant and redundant solutions? While the two terms are connected — and are often employed interchangeably, they’re not identical. There’s no absolute standard for defining the terms The most commonly used answer is:

  • Components, such as racks, disks or servers are redundant.
  • Systems, such as disk arrays, cloud computing networks are fault-tolerant.

In simple terms, it’s that you have multiple copies copy of something if the first example is not working. Two drives on the system regularly backed up is redundant since if one fails, the other will fill in the gaps. If the system itself fails and both disks fail, they are ineffective. This is why we have fault tolerance, in order to ensure that the system is as a whole functioning, regardless of whether a portion that are part of it fail.

As per AWS, “Fault-tolerance is the capability of a system to continue to function even when some of the components used in the creation of it are not able to function.” It is the AWS platform allows you to develop fault-tolerant systems with minimal human interaction and upfront financial investment.So what do you apply this in the context of EC2 as well as AWS? Amazon cloud?

Saving Grace

For many businesses cloud computing serves as both the home of mobile applications as well as a adaptable DR option in the case of local system if a system failure. What happens if the cloud itself fails? Like other cloud-based providers Amazon had outages caused by power outages, weather and other catastrophes. Even though Amazon promises 99.95 percent availability for their compute servers that still amounts to about 4 hours’ downtime each year.

Using Amazon as an DR solution is now feasible and highly recommended however, it’s not perfect. To tackle this issue, EC2 comes with several tools that help businesses improve their overall redundancy as well as their overall fault tolerance. The specific features that EC2 offers to help in this regard include availability zones and elastic IP addresses and snapshots. These are features that are highly accessible system must make use of and properly.

Ramping Up AWS Redundancy

How can businesses tackle the issue of redundancy within the EC2 examples? The first step is the availability zones (AZs). These zones are separated by regions. For example, that if you’re on West Coast of the United States you’ll have the option of several areas along the coastline. They are independently cooled and powered and each have their own networks and security systems.

They are protected from failures in other zones within the group, which makes them an easy method of redundancy. When you replicate the EC2 example across several AZs will significantly decrease the possibility of a complete interruption or failure. It’s important to note that bandwidth over zones costs $0.01/GB that’s only a fraction of the price of Internet traffic, however it is essential to think about when calculating the cost of cloud.

It’s also essential to note that information transfer has an upper limit, which is limited to the light speed. This means that when you’re using two geographically distant AZs to host your EC2 examples, you might be experiencing some delay in the case of a failure. Amazon Web Services are available in Geographic Regions and multiple availability Zones (AZs) in a specific region that provide quick connection to redundant deployment sites.

Finding Fault Tolerance

As indicated by in the AWS Reference Architecture for Fault Tolerance and High Availability even though higher-level services like those offered by the Amazon Simple Storage Service (S3), Amazon SimpleDB, Simple Queue Service (SQS) and Elastic Load Balancing (ELB) are intrinsically fault-tolerant. EC2 examples are equipped with various tools that need to be employed to ensure the overall level of fault tolerance.

For example, using ELB can assist in moving workloads from failed EC2 examples to ensure that you don’t waste resources, when you create the Auto Scaling group in addition to an existing ELB load balancer will be able to automatically stop “unhealthy” examples , and start new examples.

It is also important to make usage for elastic IP addresses. They are IP addresses with public IP addresses that are mapped onto any EC2 example located in the same area, as they are associated with your AWS account, not the example. If there is an unexpected EC2 malfunction elastic IP allows you to transfer the network’s requests and traffic in less than two minutes.

It’s recommended to use Snapshots when combined with S3 -by regularly taking point-in time photographs from your EC2 example by saving the snapshots to S3 and then replicating the snapshots across multiple AZs and AZs, you can minimize the impact of sudden or new issues.

The Final Words

Verner Vogels who is Amazon’s CTO, said “Failures are inevitable and all things will eventually fail with time.” The quote is amazing because of its simplicity. An effective solutions architect recognizes the reality that the failure of systems is inevitable. Thus disaster recovery should not be considered as a contingency strategy, instead, it should be treated as a matter of normal procedure.

Related Posts