Stats

Dreamhost collects the access and error logs for the web site domains they host for me. The stats are crunched by Analog. The numbers are okay. I much prefer Google Analytics. (Even AWStats is better.) Analog is good enough.

While at Bbworld, Nicole asked me about the hits to her wedding web site. She made it sound like then she and Ashley had the data but just needed to know how to interpret the data? Now a couple days later they didn’t have the data. Instead, they ran into a password issue.

Shell / FTP:

What I had suggested to Nicole was Ashley could find the stats by going to the logs/william-nicole.com to find the data. (Actually it was logs/william-nicole.com/http/html)

Web:

Since, only Ashley’s user can access the stats through the shell / FTP route, I went into my admin panel to add Nicole and myself a user to access the stats. I erroneously assumed the user with access to manage the content (Ashley) would have access to the stats. Instead, Dreamhost only automatically grants the panel user (me) access to stats. Doh! So I ended up creating them both accounts.

Shameless Plugs:

Nicole’s site is http://william-nicole.com/.

Another site I am hosting for Shel is http://artistictraveler.nu/.

Better Way to Count

Our awesome sysadmins have put the user agent into our AWStats so we are tracking these numbers now. They discovered something I overlooked. Netscape 4.x is 10 times more used than 7.x or 8.x. Wowsers! Some people really do not give up on the past.

Back in the Netscape is dead post, I used this to count the Netscape 7 hits.

grep Netscape/7 webserver.log* | wc -l

Stupid! Stupid! Stupid! The above requires running for each version of Netscape. This is why I missed Netscape 4.

This is more convoluted, but I think it its a much better approach.

grep Netscape webserver.log* | awk -F\t ‘{print $11}’ | sort | uniq -c | sort -n

It looks uglier, but its much more elegant. Maybe I ought to make a resolution for 2008 to be elegant in all my shell commands.

This version first pulls any entries with Netscape in the line. Next, the awk piece reports only the user agent string. The first sort puts all the similar entries next to each other so the uniq will not accidentally duplicate. The -c in the uniq counts. The final sort with the -n orders them by the uniq’s count. The largest will end up at the bottom.

BbWorld Presentation Redux Part II – Monitoring

Much of what I might write in these posts about Vista is knowledge accumulated from the efforts of my coworkers.

This is part two in a series of blog posts on our presentation at BbWorld ’07, on the behalf of the Georgia VIEW project, Maintaining Large Vista Installations (2MB PPT).

Part one covered automation of Blackboard Vista 3 tasks. Next, let’s look at monitoring.

Several scripts we have written are in place to collect data. One of the special scripts connects to Weblogic on each node to capture data from several MBeans. Other scripts watch for problems with hardware, the operating system, database, and even login to Vista. Each server (node or database) has, I think, 30-40 monitors. A portion of items we monitor is in the presentation. Every level of our clusters are watched for issues. The data from these scripts are collected into two applications.

  1. Nagios sends us alerts when values from the monitoring scripts on specific criteria fall outside of our expectations. Green means good; yellow means warning; red means bad. Thankfully none in our group are colorblind. Nagios can also send email and pages for alerts. Finding the sweet spot where we get alerted for a problem but avoid false positives perhaps is the most difficult.
  2. An AJAX application two excellent members of our Systems group created called internallyl Stats creates graphs of the same monitored data. Nagios tells us a node failed a test. Stats tells us when the problem started, how long it lasted, and if others also displayed similar issues.We also can use stats to watch trends. For example, we know two peaks by watching WIO usage rise to a noonish peak slough by ~20% and peak again in the evening fairly consistently over weeks and months.

We also use AWStats to provide web server log summary data. Web server logs show activity of the users: where they go, how much, etc.

In summary, Nagios gives us a heads up there is a problem. Stats allows us to trend performance of nodes and databases. AWStats allows us to trend overall user activity.

Coradiant TrueSight was featured in the vendor area at BbWorld. This product looks promising for determining where users encounter issues. Blackboard is working with them, but I suspect its likely for Vista 4 and CE 6.

We have fantastic data. Unfortunately, interpreting the data proves more complex. Say the load on a server hosting a starts climbing, its the point we get pages and continues to climb. What does one do? Remove it from the cluster? Restart it? Restarting it will simply shift the work to another node in the cluster. Say the same happens with the database. Restarting the database will kick all the users out of Vista. Unfortunately, Blackboard does not provide a playbook on what to do with every support possibility. Also, if you ask three DBAs, then you will likely get three answers.
😀

Its important to balance the underreaction and overreaction. When things go wrong, people want us to fix the problem. Vista is capable of handling many faults and not handling very similar faults. The link example was a failed firewall upgrade. I took a similar tact with another firewall problem earlier this week. I ultimately had to restart the cluster that evening because it didn’t recover.

Part three will discuss the node types.