"Behind the Scenes"
|December 2012||The monthly newsletter by Felgall Pty Ltd|
The computers running the internet only have a certain capacity each. The connections between the computers also only have a fixed maximum capacity. As these maximums are approached the site hosted on that computer will slow down and eventually will not be available at all. You will end up with a blank screen waiting for an indefinite period of time to even attempt to download the page. When someone sets out to cause this to happen deliberately it is called a 'Denial of Service' or DoS attack (not to be confused with DOS which is something completely different.
Where a web site goes down with this problem it means that the number of visitors that are trying to access the site has exceeded the capacity of the computer or the available bandwidth. It means that the site owner has not catered for that number of people all trying to visit their site at the same time. This might happen unexpectedly if someone starts promoting a page through social media and it 'goes viral' with everyone wanting to visit the page. This will happen rarely enough that there is little point in most sites trying to cater for it happening until it actually does.
The situation where a page or site is being created that is actually expected to receive lots of visitors is a completely different situation. Here there is an expectation that a certain number of visitors will try to visit at the same time and so plans should be made in advance to make sure that the hosting being used can cater for the volume of visitors that are expected.
So how does a site know how many visitors to expect so that they can select appropriate hosting? Well one way is to promote the page in advance of it going live and ask people to register to be reminded when the page is about to go live. A percentage of those who are interested will sign up and you can then estimate what the actual visitors will be by estimating the percentage who pre-registered.
Recently there was a web site that was set up in Australia with the intention of running a special 24 hour online sale which allowed customers to pre-register. They invited many Australian retailers to participate (including many with physical shops all over the country but little so far online) and charged each of the retailers to advertise their participation and discounted products on their site. Those setting up the site warned all of the retailers participating that they would need to upgrade their hosting to make sure that their web site would be able to handle the much larger number of visitors that they could expect on the day of the sale.
Since the event was a one day sale, estimating the expected visitors from the pre-registrations would be far easier as all of the visitors could be expected within a twenty four hour period rather than being spread out over a longer period - this meant that a lot more capacity would be needed than if the visitors were to be spread over a longer period. As there were about a million pre-registrations before the day of the sale it would be reasonable to expect that there would be a total of perhaps somewhere between three and four million visitors over the period of the sale with perhaps as many as half of those attempting to access the site within the first minute or two of the sale - at least those would be reasonable guesses on which to base the capacity planning.
Despite being aware enough of the potential problems to be warning the retailers to upgrade their hosting and knowing the exact number of pre-registrations the organisers of the sale, the site crashed immediately as the sale started with about one and a half million people trying to access the site all at once. With that number of people trying to access the same computer at the same time they exceeded the bandwidth of the connection to the computer and that resulted in many of them not being able to access the site at all. I had their page open prior to the start of the sale because I wanted to see what would happen and as their countdown timer reached zero I hit the refresh button and was presented with an empty screen with 0% of the page downloaded. This remained unchanged for the several minutes that I left the page open.
When I tried to revisit the site an hour later they had either managed to get the bandwidth increased to handle more simultaneous visitors or more likely had enough people give up on the site that the number now trying to visit simultaneously was now within the available bandwidth. They still didn't have enough capacity within the computer hosting the site to handle all the requests though and so had replaced the site with a static page apologising for being overloaded. Only after enough people got fed up with trying to access a site that wasn't working was the site able to actually start operating as originally intended. Apparently this was about three hours after the official start of the sale but I didn't see it because I had turned off the computer and gone to bed by then. I was eventually able to see what the site was supposed to look like about twelve hours after the sale started.
As a result of this site not planning properly for the number of people that could be expected to visit the site was down for a significant fraction of the period of the sale. Some of the retailers also had their own sites unavailable for part of the sale period because they had failed to upgrade their hosting sufficiently to cater for the number of additional visitors the sale sent their way. As the retailers paid a significant amount of money to participate in this event with their site and products being advertised through the sale site there was a considerable amount of upset over the fact that the site was down for so much of the sale - particularly with those retailers whose own sites did cope with the additional load. The limited timeframe meant that appropriate capacity planning was mar more important for this situation than would normally apply where those who can't access something when it first becomes available can return days or weeks later to view it. That was the main justification for getting people to pre-register in the first place.
Unfortunately with this particular site the information about potential visitors based on the pre-registrations was not properly taken into account and the available bandwidth was exceeded for a period followed by another period where the computer capacity would have been exceeded had the actual site been accessible instead of a static page.
The larger popular sites can mostly avoid these sorts of problems. For example, Google has dozens of separate computers to run their search engine with the computers spread across numerous locations. Because each computer only runs a part of that single site the processing that is required stays within the capacity of the computer and because only a portion of the search requests are sent to that computer it stays within the available bandwidth. If enough requests are sent to the one place to exceed the bandwidth then that only affects a small fraction of those trying to do searches and if they retry their search it will usually end up going to a different computer and so will not be blocked a second time. The only side effect of setting up multiple computers like this is that each has its own copy of all the data to retrieve the results from and they are not all updates simultaneously which means that at any one time there may be two alternative sets of results available depending on whether the latest update has run on a given computer or not.
No web site can predict exactly how many people will try to visit at the same time but for most sites growth will be slow enough that the hosting can be easily upgraded in advance of the extra capacity being needed. Where it is important to make sure that a site with a limited timeframe event will have sufficient capacity, pre-registrations can be collected in order to get an estimate of the potential number of visitors however this information needs to be acted on appropriately as only a fraction of potential visitors will pre-register. There is no real excuse for a site that has gone to the trouble of getting people to pre-register to underestimate the required capacity, they just need to take into account all of those who don't pre-register and take the timeframe of their event into account to come up with reasonable figures.
The following links will take you to all of the various pages that have been added to the site or undergone major changes in the last month.