If you were to setup a basic monitoring system for a simple 3 tier web application (frontend web, application, database) what are the key elements you would monitor?

What commands would you use to determine the information? How would you track or trend the information over time (graphing)? What graphing tools have you used in the past? How would you weed out false alarms or duplicates?

    • Per node Diskspace usage
    • Per node CPU usage
    • Per process CPU usage for critical processes
    • Per process memory usage for critical processes
    • Per node Swap Usage
    • Per node / cluster Disk I/O
    • Power/Cooling traps on Chassis/Servers
    • Per node/ interface NIC Throughput
    • Number of active TCP connections
    • Basic network connectivity between tiers
    • Application specific connectivity between tiers
    • Database Connections / Pooling
    • Monitor the number of active queries on the DB
    • Monitor the average response time of queries on the DB
    • External ping / http connectivity to frontend
    • External traceroute connectivity to frontend
    • Log parsing on webserver / app / database logs for strings like CRITICAL or ERROR
    • Etc…..