I will have a few thousand customers going to be on my website purchasing all at once (about 4K in a few minutes). Last 3 times I had this we had issues with bugs or server. I believe we've handled the server issues, we use Rackspace and scaled up more servers last 2 times & didn't see any issues with servers. But we had a few bugs each time that caused major issues. I've done testing myself and had 3 others review the site and I've corrected those issues. But really the problem is that we don't discover these bugs until we are hit with this much traffic all at once. I've done load testing using a third party site but that doesn't really show us bugs as it's not real users utilizing the site & preforming functions they would normally do. We also don't have much of any money to throw at this. FYI, with our web app/site our customers send their customers to purchase goods from our site, which charges them a fee based off the number in line they are to come to the site. This "number in line" model is what causes most of the complexity and problem. But we can't change that as it's what our customer's want.
I have been in this situation as well as almost every other web application developer and I learned my lesson along the way.
You said you and 3 others tested, but I feel that 4 people may not be enough. I think at least 50 to 75 people is a much more sufficient number and here is how I do it with my own web apps:
I think that is what alpha and beta versions are for. If an app is just released without it being in beta version, there will be problems, but putting something in beta tells the public that there will be bugs and that those bugs can be tested, so releasing a Beta version prior to your actual release date is one way to go.
Also, when testing for bugs before releasing to massive traffic, all scenarios need to be tested and it is possible that 4 people will not test all of those scenarios or even contemplate them.
I suggest to companies who release software and websites to test in all platforms including, but not limited to:
1. different web browsers including IE, Firefox, chrome, safari
2. different mobile devices including smart phones and tablets
3. operating systems
4. various times of the day where internet traffic may vary and even different days of the week
5. different screen sizes and resolutions
Since every has a different version of all of these factors, testing is needed for more than 4 people.
I said that 50 to 75 people is efficient testing and you may think that is overkill, but it really shows you bugs that you never would have thought of. For that many people, give away something for free as an incentive like a free product or ebook. You will get many people willing to help out for just the incentive.Works for me.
Bruce
The best thing to do is to deep into test automation and automate at least basic scenarios using Selenium/WebDriver tool. But thinking long term you have to invest into making your application testable on all applications layers not just UI. In case you want to check how your application works under high load conditions try to use jMeter tool. In case you need much more details, clarifications or guidance just drop me a call request.
P.S. In case you are not ready yet to invest into test automation then hire manual testers on oDesk or uTest or similar freelancers website.
How I would build this feature "take a number and go to the queue":
1. get a fast datastore central to you servers. All webservers have to look on the same central datastore, because this number has to be unique for each visitor. Redis would be perfect for this task.
2. you have to have a middleware layer in your Rails, Django or PHP setup. All HTTP-Requests have to go through this middleware layer. This middleware layer is reponsible for getting the ticket from Redis and set it as a cookie. If the browser has no cookie, it gets one. If you want to ensure that visitors can't fake the cookie, just add a secret to each cookie/ticket.
Doing this in another layer than middleware will result in bugs because you have to add the code at many different places.
3. Profit (as long as your caching etc. works and you don't get slash dotted)
Performing quality assurance at every stage of development is what you are after. Ideally providing automation at every level coupled with some manual checking. The upfront costs of automation sometimes are daunting but will actually save you money in the long run.
Breaking code and breaking software is how you should thing about it. This puts the responsibility into every team member (designer, developer etc). The earlier the bugs are found the greater the ROI on setting up a world-class quality assurance plan.