Server Monitoring – Don’t Panic! We got your back….
Here at Bigwetfish Hosting we have been working hard behind the scenes to develop a comprehensive server monitoring system for the benefit of our clients. We wanted to let you know what is available to you as a BWF Client and what we do 24/7 behind the scenes to ensure the smooth running and high performance of our servers.
Service Monitoring – Shared / Reseller Servers
Some call Nagios ‘The Industry Standard in IT Infrastructure Monitoring’ and this is the software we have chosen to be the backbone of our server monitoring. From Chrome and Firefox plugins to iOS and Android Apps this software means no matter where our staff are notifications of outages will come through right away.
As we have a helpdesk that is manned 24/7/365 by our support staff this was an ideal place to send any notices.
All our Shared and Reseller servers have been added to the Nagios Monitor and if any of the following instances happen on any such server we will instantly get an alert:
- A degraded RAID array
- Server load rises above a pre defined level (level depends on CPU power)
- Apache service fails
- IMAP service fails
- MySQL service fails
- Ping returns packet loss
- POP service fails
- SMTP service fails
- SSH service fails
- Mail queue gets large perhaps indicating spamming
This alert is instant and an email is sent to our helpdesk where the technician on duty is instructed to check the server right away. Sometimes it is a simply matter of stopping some processes to reduce load and sometimes in the case of a degraded RAID array we need to get our data centre partners to ‘hot swap’a disk.
Management also have Apps on their iPhones and will also get the alerts as ‘Push Messages’ in iOS so even outside of office hours management sometimes know of events before a technician on the helpdesk calls them.
Service Monitoring – VPS Nodes / VPS Servers / Dedicated Clients
We use Nagios to monitor the health of the RAID arrays on our VPS Nodes as well as monitor the server load.
Clients who have VPS servers with us will get their services monitored if they have bought the ‘Server Management and Backup’ add on from our website: http://www.bigwetfish.co.uk/vps/server-management-options/
Clients with Dedicated Servers get the monitoring as standard on their servers.
The most important part of monitoring hardware is Drive Health as if a drive fails or an array fails there can be data loss. At BWF we monitor our drive and array health in a number of ways:
- We use nagios to monitor our RAID arrays for degraded disks and we get instant alerts when there is a problem.
- We use SMART checking on our drives to check for general drive health and specifically we check for reallocated sectors in drives as this can be an indication of a drive about to go bad. Where we see problems we will quickly act to replace a drive hopefully even before it fails.
Seeing what is happening at a glance – BWFMonitor Graph Portal
We also have a fully featured graphing portal available to all VPS and Dedicated server clients and our technicians have full access to the same graphs for our shared and reseller servers. These graphs allow us to see ‘at a glance’ what is happening on a server. Such things we have detected on such servers have been:
- High rate of inbound traffic to a shared server causing the graph to spike. Technicians were able to quickly stop a small inbound ddos against a specific website on a particular server by blocking the IP ranges that were causing the issues. Had we not seen this on our monitor this would have been a problem for longer and more clients may have had issues
- Our Nagios monitor indicated a high load on a server and at the same time we noticed a spike in outbound traffic on a particular OpenVZ VPS Node. The server owner was running compressed cpanel backups and the compression was causing server load issues. We were able to work with the client and help him implement an rsync backup solution that required no compression and as such the server load issues were resolved
If you are a VPS or Dedicated Server client and do not have access to these graphs yet just open a ticket and we can get you access right away.
Here you will see a number of examples of graphs taken from our BWF Monitor from three servers. All these graphs were taken around 4pm on Thursday 31 January 2013 and a brief explanation of each one will also follow. There are lots more graphs available such as the number of logged in users, number of running processes etc and we can customize the graphs clients see on request.
We trust you see a little more of what goes on behind the scenes to make BWF your number one choice for shared, reseller, VPS or Dedicated server hosting. We never take our clients for granted and we wanted to give you a little flavour of what our technicians do on a daily basis to monitor the servers your websites are located on. We really have seen an increase in our server uptime as a direct result of us monitoring things so closely as it allows us to get to 99% of small issues before they become a large issue.
Whilst comprehensive server monitoring will never guarantee there will be no outages or downtime we firmly believe our proactive monitoring of all critical services for our clients helps immensely in keeping things working as they should.
We also have the strong backing of our Hosting Partners (Hostdime) and their Data Centre techs from DIMEnoc to quickly react if we do experience any outages. The backing of a global company will give any of our clients confidence in the quality of our hardware solutions.
Trust our growing team of experts to manage your hosting account professionally.
We would like to thank Praveen one of our Red Hat Linux Certified Level 3 techs for taking this as his project and implementing this complete solution for us. You can find out a little more about Praveen on the ‘About Us’ page on our website.