Don't like the adverts?  Click here to remove them

Where've we been!?!

Crispin

Administrator
Staff member
Guru
Joined
Feb 24, 2010
Messages
6,052
Country Flag
great_britain
Not sure if you noticed but it's been a bit quite 'round these parts? :whistle:

Short story: Server died. We're slowly getting it back to normal again.

Long story:

On Friday morning I awoke to around 300 emails from the server complaining that there was a hardware fault. The emails were coming through think and fast.
We identified a failed hard drive in the server which the hosting company then replaced for me.
At this point I had two options - rebuild the disk with the server offline or online. Offline would have taken around 8 or nine hours to rebuild the data. Online was going to be a lot longer and that the server was going to be very busy and slow. I opted for this because having a slow server is better than no server.


The rebuild started and the server got really busy - it was ranging from 1700 to 2400% busy (so it was 240 times as busy as it could be). Things started going bad so I started turning off bits and pieces to try and sort the load out and keep it running (I have a number of other websites and email systems on the server which are business related).
Eventually brute-force turning off this site and the other websites allowed it to cope with email and rebuilding the data.

This morning things looked like they have calmed down a lot and while the server is still over 100% (hovering around 170%) it is functioning well and rebuilding the data at the same time.

I've now enabled the site for logged in users so they can at least use it. Registrations are also closed (remember that we also get about 50k fake registration attempts a day from hackers) until things have settled down. At the moment we're 300GB into nearly a terrabyte of data to be rebuilt. The ETA is 3-4 hours but it's been saying that for the last 3 hours :)

I continue to monitor it all and if things go the other way again then I'll have to disable the site to ease the load. The ability to rebuild the data correctly trumps all else.


I'l update as and when I can :)


Cheers,
Crispin
 
That explains the weird messages in tapatalk. Good luck.
 
Well done Crispin.
Of course in the meantime I have discovered my wife has green eyes and likes Jamie Oliver's cooking.
Shame she didn't tell me earlier, it's our 18th anniversary tomorrow and I could've got her a book or something if I'd known earlier.
Anyway, forum is back up and running so that's the main thing.
Good job mate.

:shifty:
 
Bravo Crispin you've done me a favour , my internet has been patchy as hell since AOL sold out to Talk Talk and the forum going down was for me the final straw and so i phoned talktalks customer services in the forlorn hope that i might not have to change service provider an expecting to get the usual fob of from someone in India .

A charming Indian lady is what i got who seemed to take my complaint as a personal failure on her part before someone handed her the usual fob off script to follow . A couple of speed tests later i was told everything is fine but i can upgrade to fibre optic with much better speeds for a more expensive monthly contract .

I replied "so you have ruined my internet connection which has been fine for years and you expect me to pay more for you to put it back the way it was - no thank you i will phone BT instead" .

The call ended with her begging me to monitor the situation for 24 hours before switching provider , and today my internet is as fast as i've ever seen it :icon-biggrin:
 
Last edited:
I have to admit that I was disappointed to see a broken Land Cruiser... what would have been more appropriate would have been a photo such as a Cruiser rescuing Land Rover...

However, thanks for your help and your efforts in fixing the site!

Ed
 
Don't like the adverts?  Click here to remove them
Not sure about all this techno babble, can we really go over 100% busy, not sure that is possible..... :whistle:

Good job getting it all back up.
 
and we're open for business as usual now :)
Server completed the rebuild about an hour ago and everything is now running normally. :dance:

I'm glad us being down has rekindled the spark in some marriages. Maybe I should venture out into something else...


Ed - I was trying to find a picture of legs sticking out from a land rover but it's surprisingly hard. The (sad) picture of my 150 (yet again) on the back of a roll-back was simpler to find


Tony - Was waiting for someone to ask that :) Linux shows load and backlog. See here for a simple explanation (has pictures ;) ) http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
We were hovering around 170.0 - 240.0 Current load, and this is normal daytime load, is 0.49 (so we're 50% busy)


Anyway, thanks for the understanding chaps - all's well now :dance:
 
Last edited by a moderator:
Not sure about all this techno babble, can we really go over 100% busy, not sure that is possible..... :whistle:

Good job getting it all back up.

Have a chat with my boss, he'll explain.. :|.:whistle:
 
Cheers Cris, appreciate the hours you've put in behind the scenes and over a weekend too.
 
I can understand a 100% increase but that is not 200% :eusa-naughty:
 
Back
Top