I got an email alert from SiteUptime to say:
Dear Peter Wilkinson,
This is an automated message from SiteUptime.
Alert Type: Site Not Available Result: Failed Time: September 12, 2005 09:09:43 PST HostName: petersblog.org Monitor Name: Peter's Blog Service: http
There was another email half an hour later to say the site was back again (the polling period is half an hour).
Looking on the server itself:
$ uptime
09:01:45 up 21 days, 11:21, 1 user, load average: 0.00, 0.00, 0.00
so the server didn't reboot.
Grepping through the apache access log for SiteUptime..
67.30.130.180 - - [12/Sep/2005:15:11:18 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:15:41:20 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:16:11:21 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:16:41:22 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:17:41:24 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:18:11:25 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:18:41:26 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:19:11:27 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
67.30.130.180 - - [12/Sep/2005:19:41:29 +0100] "HEAD / HTTP/1.0" 200 - "-" "SiteUptime.com"
Missing entry at 17:11.
Looking around this in the apache access log:
80.88.204.40 - - [12/Sep/2005:17:04:09 +0100]
80.88.204.40 - - [12/Sep/2005:17:04:09 +0100]
194.244.83.8 - - [12/Sep/2005:17:41:16 +0100]
194.244.83.8 - - [12/Sep/2005:17:41:17 +0100]
No traffic between 17:04 and 17:41.
Nothing in the error log. None of the other logs show anything suspicious.
I think my conclusion here is that there was a loss of connectivity within oneandone and my server was temporarily disconnected from the internet. Now lets think, if they promise 99% uptime does this mean the server is running or the server is running and connected to the internet?
Anyway, oneandone sent me this email this morning:
Dear Peter Wilkinson,
Please be advised that due to an upgrade of the 1&1 Data Centre, we will need to shut down your server for a short time while technicians perform an internal realignment of hardware.
This move will be made during the night from 18.09.2005 to 19.09.2005, between 11:00 PM and 06:00 AM.
Your data will not be affected by the move. However, as a precautionary measure, we recommend strongly that you first back up your data and server settings.
Best regards,
The 1&1 Team
so apparently they are shuffling things about and could well be the cause.
The alert emails were forwarded to my vodafone email account but the filters on that prevented me from getting notification text messages (until I got a message I didn't know what 'from' address to set the filter to) so I wasn't sent alarming and costly (2x10p!!) text messages.
Conclusion: I shouldn't panic, I haven't run a site monitor before, this kind of thing probably happened all the time I was on site5 and I was blissfully unaware.