Conferences

You are currently browsing the archive for the Conferences category.

Last year, we three DBAs submitted three proposals thinking one might be accepted. All three were. Its daunting to think of something because we are behind the times. We run Vista 3.0.7 while almost everyone else is at least on 4.1.x or higher. Also, we ended up changing our presentations last year because we were not doing things we thought we would be doing. Ugh.

Presenting at BbWorld or Blackboard Developers Conference is a great professional development opportunity and fabulous way to share your knowledge with your peers. BbWorld® ‘08Deadline for Proposal Submission: February 22, 2008

Maybe we could do one on:

  • Staying Beneath the Threshold of Doom: 6-8 vs. 40 clusters?
  • Planning the Largest Vista 3 to 4 Migration
  • API Logging: Users Connection to Vista Not in Your Logs
  • Creating an Audit of User Activity

Several of us saw a demo of Coradiant Truesight yesterday (first mentioned in the BbWorld Monitoring post). Most of the demo, I spent trying to figure out the name Jeff Goldblum as one of team giving the demo had the voice and mannerisms of the actor’s characters. Had he mentioned a butterfly, then I definitely would have clapped. The other reminded me of John Hodgman.

Something I had not noticed at the time, but a reoccurring point of having Truesight is to tell our users, “Here is evidence the problem is on your end and not ours.” This assumes the users are rational or will even believe the evidence. They wish the problem never occurred (preference) and a resolution (secondarily). Preventing every problem, especially issues outside our domain, probably is outside the scope of the budget we receive. So, we are left with resolving the issues. Especially scary are the users who take evidence the problem is on their end or their ISP’s end to mean, “This is all your fault.”

Resolutions we can we offer are:

  1. Hardware change – We can replace or alter the configuration of the hardware components of the network, storage, database, or application.
  2. Software change – We can alter the configuration of the software components of the network, storage, database, or application.
  3. Request a code change from a vendor – We can work with our vendors to get a code change. These take forever to implement.
  4. Suggest a user resolve the issue -
    1. We can provide a work around (grudgingly accepted, remember the preferred wish is the problem never occurred).
    2. We suggest configuration changes the user can make to resolve the problem.

Truesight provides us information to help us try to resolve issues. Describing the information provided as “facts” was a nice touch. At Valdosta State, I gave up on users reporting the browsers accurately and captured the information from the User-Agent header. Similarly, at the USG, I’ve found users disagree ~30% of the time about the version of the browser according to the User-Agent string. Heck, they have errors in the name of the class ~40% of the time. My favorite is something took 15 minutes, but all I could find was it took four minutes. Ugh. Because Truesight is capturing the header info, it ought to be much easier to confirm what users were doing and where problems occurred more accurately than the users can describe.

After receiving all the “facts”, we still have to determine the cause. Truesight helps us understand the scope of the problem by how many users, how many web servers, and how many pages are affected by slowness to what degree. As a DBA and administrator, my job identifying cause ought to be easier, though quantifying how much easier probably is difficult to say.

Part of why: (Mostly speculation.) Problems identified as a spike in anything other than “Host” are external causes. These are causes in front of the device. Causes behind the device are “Host”. If these were more narrowly broken down, the maybe we could better determine cause. That would require knowledge web browsers typically would not know like the server processing time, query processing time, or even the health of the servers.

tag: , , , ,

Index of posts:

  1. RE 2007: GeorgiaVIEW Meeting (Pre-Conference)
  2. RE 2007: Birds of Feather: GeorgiaVIEW Vista
  3. RE 2007: Top Ten Disruptive Trends
  4. RE 2007: Birds of a Feather: Luminis
  5. RE 2007: Administering Sakai
  6. RE 2007: GeorgiaVIEW Vista File and Content Sharing
  7. RE 2007: USG Digital Content Repositories: Resources to Share

After this point, I got wrapped up in other things, moderating, fireworks, a Texas Hold ‘Em tournament, and dealing with tickets. The above are all sessions which affect my area even tangentially. Hope you enjoy.

I am blogging from the pre-conference GeorgiaVIEW meeting @ Rock Eagle yesterday afternoon and this morning. I enjoy connecting with people around the state of Georgia who use our Vista system. Most of them do not make it to BbWorld. Some hot topics:

  • Alternatives to Blackboard Vista
  • Training
    • Content repository
  • Returning Reports and Tracking to instructors.
    • Some reports still failing. One approach may be to remove tracking data from Vista database and make it available elsewhere.
  • Upgrade to Vista 4. People want a timeline, access to a training instance ASAP, please not do an in-place upgrade.
    • Limited shelf life on internals of Vista 3 / 4.0 – 4.1.2
    • More of customers have moved or are moving to Vista 4 / CE 6 than a year ago.
    • Can take advantage of new tools available in Vista 4.
    • Data retention – policy, reponsibilities (faculty, campus, OIIT)
    • Phased approach – parallel environments, at some point Vista 3 goes away and no longer available.
    • End of Fall 2008 or Spring 2009.
  • People are both quite happy we are going to Vista 4 and disconcerted at the prospect of having to move to Vista 4 in even over a year from now (at the worst by April 2009).
    • Export / import of non-SIS created users.
    • Training

Lovely (yeah a real person and she is) says Lovely Freelove would be one of the best names ever.

At the BbWorld Developers’ Conference (Thursday afternoon and Friday morning after BbWorld), there was a session by John Fontaine called What the Heck is a Hotfix? (PPT,audio recording). I’d been meaning to go look for this at the Bb Connections web site where the conference presentations were uploaded. However, I found this through a Bb knowledge base link to eduGarage which apparently is the new home of the Blackboard Developers Network.

  • Ad Hoc Patch – fixes a single issue
  • Hot Fix – Multiple usually related code changes (5-6 issues)
  • Service Pack – Many code changes (50-60 issues)
  • New Release (either Application Pack or new version number)- New features and Large scale code changes

Much of what I might write in these posts about Vista is knowledge accumulated from the efforts of my coworkers.

This is part two in a series of blog posts on our presentation at BbWorld ‘07, on the behalf of the Georgia VIEW project, Maintaining Large Vista Installations (2MB PPT).

Part one covered automation of Blackboard Vista 3 tasks. Next, let’s look at monitoring.

Several scripts we have written are in place to collect data. One of the special scripts connects to Weblogic on each node to capture data from several MBeans. Other scripts watch for problems with hardware, the operating system, database, and even login to Vista. Each server (node or database) has, I think, 30-40 monitors. A portion of items we monitor is in the presentation. Every level of our clusters are watched for issues. The data from these scripts are collected into two applications.

  1. Nagios sends us alerts when values from the monitoring scripts on specific criteria fall outside of our expectations. Green means good; yellow means warning; red means bad. Thankfully none in our group are colorblind. Nagios can also send email and pages for alerts. Finding the sweet spot where we get alerted for a problem but avoid false positives perhaps is the most difficult.
  2. An AJAX application two excellent members of our Systems group created called internallyl Stats creates graphs of the same monitored data. Nagios tells us a node failed a test. Stats tells us when the problem started, how long it lasted, and if others also displayed similar issues.We also can use stats to watch trends. For example, we know two peaks by watching WIO usage rise to a noonish peak slough by ~20% and peak again in the evening fairly consistently over weeks and months.

We also use AWStats to provide web server log summary data. Web server logs show activity of the users: where they go, how much, etc.

In summary, Nagios gives us a heads up there is a problem. Stats allows us to trend performance of nodes and databases. AWStats allows us to trend overall user activity.

Coradiant TrueSight was featured in the vendor area at BbWorld. This product looks promising for determining where users encounter issues. Blackboard is working with them, but I suspect its likely for Vista 4 and CE 6.

We have fantastic data. Unfortunately, interpreting the data proves more complex. Say the load on a server hosting a starts climbing, its the point we get pages and continues to climb. What does one do? Remove it from the cluster? Restart it? Restarting it will simply shift the work to another node in the cluster. Say the same happens with the database. Restarting the database will kick all the users out of Vista. Unfortunately, Blackboard does not provide a playbook on what to do with every support possibility. Also, if you ask three DBAs, then you will likely get three answers.
:D

Its important to balance the underreaction and overreaction. When things go wrong, people want us to fix the problem. Vista is capable of handling many faults and not handling very similar faults. The link example was a failed firewall upgrade. I took a similar tact with another firewall problem earlier this week. I ultimately had to restart the cluster that evening because it didn’t recover.

Part three will discuss the node types.

Much of what I might write in these posts about Vista is knowledge accumulated from the efforts of my coworkers.

I’ve decided to do a series of blog posts on our presentation at BbWorld ‘07, on the behalf of the Georgia VIEW project, Maintaining Large Vista Installations (2MB PPT). I wrote the bit about tracking files a while back in large part because of the blank looks we got when I mentioned in our presentation at BbWorld these files exist. For many unanticipated reasons, these may not be made part of the tracking data in the database.

Automation in this context essentially is the scheduling of tasks to run without a human needing to intercede. Humans should spend time on analysis not typing commands into a shell.

Rolling Restarts

This is our internal name for restarting a subset (consisting of nodes) of our clusters. The idea is to restart all managed nodes except the JMS node, usually one at a time. Such restarts are conducted for one of two reasons: 1) have the node pick up a setting or 2) have Java discard from memory everything. The latter is why we restart the nodes once weekly.

Like many, I was skeptical of the value of restarting the nodes in the cluster once weekly. Until, as part of the Daylight Savings Time patching, we provided our nodes to our Systems folks (hardware and operating systems) and forgot to re-enable the Rolling Restarts for one batch. Those nodes starting complaining about issues into the second week. Putting back into place the Rolling Restarts eliminated the issues. So… Now I am a believer!

One of my coworkers created a script which 1) detects whether or not Vista is running on the node, 2) only if Vista is running does it shut down the node, 3) once down, it starts up the node, and 4) finally checks that it is running. Its pretty basic.

Log cleanup to preserve space

We operate on a relatively small space budget. Accumulating logs infinitum strikes us as unnecessary. So, we keep a months’ worth of logs for certain ones. Others are rolled by Log4j to keep a certain number. Certain activities can mean only a day’s worth are kept, so we have on occasion increased the number kept for diagnostics. Log4j is so easy and painless.

We use Unix’s find with mtime to look for files 30 days old with specific file names. We delete the ones which match the pattern.

UPDATE 2007-SEP-18: The axis files in /var/tmp will go on this list, but we will delete any more than a day old.

Error reporting application, tracking, vulnerabilities

Any problems we have encountered, we expect to encounter again at some point. We send ourselves reports to stay on top of potentially escalating issues. Specifically, we monitor for the unmarshalled exception for WebLogic, that tracking files failed to upload, and we used to collect instances of a known vulnerability in Vista. Now that its been patched, we are not looking for it anymore.

Thread dumps

Blackboard at some point will ask for thread dumps at the time the error occurred. Replicating a severe issue strikes us as bad for our users. We have the thread dumps running every 5 minutes and can collect them to provide Blackboard on demand. No messing with the users for us.

Sync admin node with backup

We use rsync to keep a spare admin node in sync with the admin node for each production cluster. Should the admin node fail, we have a hot spare.

LDIS batch integration

Because we do not run a single cluster per school and the Luminis Data Integration Suite does not work with multiple schools for Vista 3 (rumor is Utah has it working for Vista 4), we have to import our Banner data in batches. The schools we host send the files, our expert reviews the files and puts them in place. A script finds the files and uploads each in turn. Our expert can sleep at night.

Very soon, we will automate the running of the table analysis.

Anyone have ideas on what we should automate?

I am home from the Tennessee Baha’i School. I enjoyed the weekend.

Meeting new people is not something I’d normally place high on my list. However, I have yet to go to a Baha’i conference or weekend school where I did not come away feeling happy to have met all those I did. Naturally, since I am horrible with names, I don’t remember the names of 1/2 of them.

I can do better.


Banner, originally uploaded by Ezra F.

Presenters were encouraged to join a web site and post a picture. People recognized me.

Presentations:

  • 2007. Edwards, A., Hernandez, G., and Freelove, E. Capacity Planning and Predicting Growth for Vista.
  • 2007. Hernandez, G., Edwards, A., and Freelove, E. Database Administration for Vista: Lessons Learned.
  • 2007. Freelove, E., Edwards, A., and Hernandez, G. Maintaining Large Vista Installations (2MB PPT).

This latter one I composed for the conference. The first speaker in the others is the one who composed those.

Its likely we will do the same the presentations in October at Rock Eagle.

I’m featured in the banner picture.

In the last throes of the BbWorld ‘07 Developer’s Conference (the regular conference ended yesterday). Some pictures are in Flickr in my “BbWorld 2007” set. I’ll likely post the rest tonight. Our presentations should be posted soon on the conference site.

Some important ideas of keynotes:

EdVentures in Technology » Notes from BbWorld 2007 in Boston, MA:

Incentives matter
– Steven Leavitt

Arkansawyer » 2007» July» 11 (most comprehensive BbWorld ‘07 blog I’ve fount):
Guy Kawasaki

  1. Make meaning.
  2. Make mantra.
  3. Jump to the next curve.
  4. Roll the DICEE.
  5. Don’t worry, be crappy.
  6. Polarize people.
  7. Let a hundred flowers blossom
  8. Churn, baby, churn
  9. Niche thyself
  10. Follow the 10/20/30 rule
  11. Don’t let the bozos grind you down

Talked to lots of Blackboard upper management. I haven’t drank the Kool-Aid. :D

« Older entries