Recently in systems Category

It seems that the weather forecast for the Internet is a bit "cloudy" nowadays and it will stay this way at least for some time. The "clouds" are a very hot topic right now and more and more companies try to get on the bandwagon as soon as possible - some just run tests while others go into production. You can run "your own" cloud environment for peanuts, the costs are so marginal that it made me laugh when I got my last bill from Amazon AWS, but nevertheless it doesn't always calculate to run your stuff on commercial cloud, especially if you have hardware at hand. The DIY approach is easier than it seems to be. Here is how I've built my own, small "cloud" to solve a problem I was facing at work. It's not a rocket science, it's not full blown management system with hundreds of machines... it works for me and I believe anyone can build similar system - hopefully much better than I did with mine.

Staying away from terminology like HPC/cluster/cloud/grid and meanings of those I use the term "cloud" because I think it's the closest to what I've got now in my prototype - it's still work in progress and it gets even more "cloudy" or change shape otherwise. There won't be any code this time - maybe when I finish it properly and have some proper performance stats - so far it's just a running and usable PoC I describe here :-)

Tuning Nagios for running off CF Card

| | Comments (0) | TrackBacks (0)
As a follow up to my previous post I've run my Nagios installation on Soekris net4801 implementing the advice I've given you in my last post (focusing on slow I/O when writing to CF Card), describing the platform and what can be done with it. The changes in system behavior are huge - in a positive way of course.

First of all the system is not so overloaded now and I guess I could double the amount of tests run on this platform without getting into trouble like before. At the moment this system is monitoring 36 machines with 86 services in total. Some time ago I had to stop adding and literally remove some less important tests, because most of the time I was getting false positives - usually warnings, with comment that the plugin has timed out. So how big is the difference?
Some time ago (rather long long time ago) we have decided to purchase some small device to turn it into very portable server, that we could send to one of our friends to host. The whole purpose was to get Nagios on it and to monitor our sites from outside of our networks. To some people it may sound crazy, but it makes kind of sense - how many times you have heard from someone "it works on my computer"? Too many times?

The goal is to know when my (and possibly why) visitors/customers can't reach my servers and to be able to diagnose if that is local to some location or network part or it affects wider audience. Up to some point remote sensor answers that question - at least from a perspective of his particular location.

After looking around the net we've decided to get one of those famous Soekris kits.

net4801-front.jpg

Was it a good choice as a hardware platform? How will it scale when the amount of monitored systems will reach certain level? Let's see where it got us so far as the system is live for about a year now.

April 2009: Monthly Archives

Creative Commons License
This weblog is licensed under a Creative Commons License.