Missing Shutdowns

A Weblogic managed node for a development cluster failed to shutdown when our shutdown script requested. The last managed node to shutdown becomes the JMS node and triggers a rewrite of the config.xml. We have scripts in place to check for the config.xml changing and alert us. Since I am the on call this week, I received the page.

I thought it would be good enough to copy the config.xml into place. Since it would be restarted that night by the usual shutdown script, the cluster would pick up the new config.xml and all would be well again. Ha! Normally a node has an entry in WebCTServer.99999999999.log stating something like “Server shutdown has been requested by system.” for each shutdown then the log ends. These are completely missing. That intrigued me as expected these to be present but some reason for why it failed. Instead, it was like no request was sent.

The shutdown script log showed this error where it ought to show the node was shutdown successfully. (Actual names replaced with CLUSTER_NAME and NODE.)

Error:CLUSTER_NAME:Name=NODE,Location=NODE,Type=ServerConfig.

The shutdown script call Blackboard’s stopWebCTServer.sh which just calls another script which takes various inputs to ultimately call:

java -classpath $CLASSPATH weblogic.Admin -url t3://$HOSTNAME:$PORT -username $WL_USER -password $WL_PASS SHUTDOWN

CLASSPATH= can be found in the start scripts and has multiple entries in setEnv.sh. Just run “sh setEnv.sh” to set this for your session.
HOSTNAME= server hostname
PORT= HTTP port where Weblogic listens
WL_USER= Weblogic user
WL_PASS= Weblogic user’s password

Instead of shutting down the node it just gave an irrelevant status. This one node gave a Weblogic command-line response which seems to mean a botched connection but one where something was listening. The not listening or wrong address error is:

Failed to connect to t3://$HOSTNAME:$PORT: Destination unreachable; nested exception is: java.net.ConnectException: Connection refused; No available router to destination.)

Yay, another case to figure out to correctly handle.

I killed the process. For safe measure I did a “touch REFRESH” in $WL_DOMAIN so the node would dump anything it had cached and download new these things. Since it started up this morning in the normal restart, I think it is fixed.

Other than JMS migrating and failing to do so, I don’t think this caused any problems for users. Just so very odd.

P.S. weblogic.Admin is deprecated in Weblogic 9, so it is interesting Blackboard still makes use of it.

xmllint

This Linux tool is my new best friend. We get thousands of XML files from our clients for loading user, class, and enrollment information. Some of these clients customize our software or write their own software for generating the XML.

This means we frequently get oddities in the files which cause problems. Thankfully I am not the person who has to verify these files are good. I just get to answer the questions that person has about why a particular file failed to load.

The CE/Vista import process will stop if its validator finds invalid XML. Unfortunately, the error “An exception occurred while obtaining error messages.  See webct.log” doesn’t sound like invalid XML.

Usage is pretty simple:

xmllint –valid /path/to/file.xml | head

  1. If the file is valid, then the whole file is in the output.
  2. If there are warnings, then they precede the whole file.
  3. If there are errors, then only the errors are displayed.

I use head here because our files can be up to 15MB, so this prevents the whole file from going on the screen for the first two situations.

I discovered this in researching how to handle the first situation below. It came up again today. So this has been useful to catch errors in the client supplied files where the file failed to load.

1: parser error : XML declaration allowed only at the start of the document
 <?xml version=”1.0″ encoding=”UTF-8″?>

162: parser error : EntityRef: expecting ‘;’
<long>College of Engineering &amp&#059; CIS</long>

(Bolded the errors.) The number before the colon is the line number. The carat it uses to indicate where on the line an error occurred isn’t accurate, so I ignore it.

My hope is to get this integrated into our processes to validate these files before they are loaded and save ourselves headaches the next morning.

Weblogic Diagnostics

I noticed one the nodes in a development cluster was down. So I started it again. The second start failed, so I ended up looking at logs to figure out why. The error in the WebCTServer.000000000.log said:

weblogic.diagnostics.lifecycle.DiagnosticComponentLifecycleException: weblogic.store.PersistentStoreException: java.io.IOException: [Store:280036]Missing the file store file “WLS_DIAGNOSTICS000001.DAT” in the directory “$VISTAHOME/./servers/$NODENAME/data/store/diagnostics”

So I looked to see if the file was there. It wasn’t.

I tried touching a file at the right location and starting it. Another failed start with a new error:

There was an error while reading from the log file.

So I tried copying to WLS_DIAGNOSTICS000002.DAT to WLS_DIAGNOSTICS000001.DAT and starting again. This got me a successful startup. Examination of the WLS files revealed the the 0 and 1 files have updated time stamps while the 2 file hasn’t changed since the first occurance of the error.

That suggests to me Weblogic is unaware of the 2 file and only aware of the 0 and 1 files. Weird.

At least I tricked the software into running again.

Some interesting discussion about these files.

  1. Apparently I could have just renamed the files. CONFIRMED
  2. The files capture JDBC diagnostic data. Maybe I need to look at the JDBC pool settings. DONE (See comment below)
  3. Apparently these files grow and add a new file when it reaches 2GB. Sounds to me like we should purge these files like we do logs. CONFIRMED
  4. There was a bug in a similar version causing these to be on by default.

Guess that gives me some work for tomorrow.
🙁

The Ares Imperative

The Ares ImperativeA friend of mine, Steve Ekstrom, is the writer of this comic which I enjoyed for the this first 8 pages. I’m looking forward to the next installments. Check out The Ares Imperative! (And vote for it if you like it. The winner gets published by DC Comics.)
Interview:

Synopsis:

It’s the early 21st Century and corporations continue to manipulate world governments as emerging quasi-religious science cults and techno-centric international terrorists are beginning to develop their own biological weapons mapped out in human genomes. Special Agent Adam Geist operates covertly within the framework of the ultra-classified PROJECT ARES division of the C.I.A. under the supervision of Deputy Director Ted Gerard and his assistant Maxwell Clearwater.

Geist does not fully comprehend the processes, which he has undergone as a part of PROJECT ARES but numerous studies have revealed that alien mitochondria have asserted control of his DNA—altering his higher intelligence functions and his nervous system receptor processing speed. He has become sensitive to electromagnetic fields and has developed heightened senses, which include something akin to Wi-Fi reception. His skin is capable of rapid, localized cellular density adaptation—making him virtually bulletproof.

Due to the secret nature of his existence and the fear that a “super-man” would create in light of the unstable relations between the U.S. and other world powers, Geist is under strict orders: he must eliminate anyone—friend or foe—who learns of his uncanny abilities. Sadly, as he grows in power, his own humanity diminishes from the actualization of his computer-like brain—and now, evidence is beginning to surface that his own strange biology may, in fact, be malevolent in nature…

Endings

George R. R. Martin ranting about bad endings seems odd. “C’mon. Writing 101.”

One of my bigger terrors is his end to A Song of Ice and Fire will be bad. A slightly bigger one is 3 years between books means the end is possibly a decade away and a sedentary lifestyle will prevent us from getting to read it.

Maybe writing 101 really means never put it down on paper?

Mail From Address

It appears CE/Vista has several locations for defining the email addresses it uses for SMTP.

  1. $WEBCTDOMAIN/config/config.xml:
    mail.from=
    From address for messages sent.
  2. $WEBCTDOMAIN/customconfig/startup.properties:
    WEBCT_ADMIN_EMAIL=
    Some internal errors have a mailto: prompt to contact the server administrator.
  3. $WEBCTDOMAIN/serverconfs/log4j.properties:
    log4j.appender.EMail.To=
    Report fatal errors.
  4. $WEBCTDOMAIN/serverconfs/log4jstartup.properties:
    log4j.appender.EMail.To=
    Report fatal errors.
  5. $WEBCTDOMAIN/webctInstalledServer.properties:
    WEBCT_ADMIN_EMAIL=
    Installer picks up this value for populating #2 and possibly #3 and #4.
  6. $WEBCTDOMAIN/webctInstalledServer.properties:
    MAIL_ORIGIN=
    Installer picks up this value for populating #1.

What really disturbs me is the Vista 8 installer created log4j properties files with the  SMTP server set up for miles.webct.com and sending from vista.monitor@webct.com? I cannot seem to find anything in the Vista 8 documentation or wiki or Google index about the “Vista Trap Notification” subject line, from address, or SMTP address which the log4j appender appears to be designed to send.

This Vista Trap Notification appears designed to send an email to the address any time a fatal error is encountered. That’s fine. Just use the smtp host and From address requested in the installer.

Don’t get me started about giving end users a mailto: prompt to report errors.

Rock Eagle Debrief

GeorgiaVIEW

  1. SMART (Section Migration Archive and Restore Tool) created for us by the Georgia Digital Innovation Group seemed well received. I’m glad. DIG worked tirelessly on it on an absurdly short schedule.
  2. Information is strewn about in too many places. There isn’t one place to go for information. Instead between Blackboard, VistaSWAT, and GeorgiaVIEW about 29. I amazed I do find information.
  3. Blackboard NG 9 is too tempting for some.
  4. Vista does DTD valdiation but not very well. We need to XML validation before our XML files are run. As we do not control the source of these files and errors by those creating the files cause problems, we run them in test before running in production. I am thinking of something along the lines of validating the file and finding the errors and reporting to the submitter the problems in the file. Also, it should do XML schema validation so we can ensure the data is as correct as possible before we load it.
Yaketystats
  1. If you run *nix servers, then you need Yaketystats. I have been using it for 2 years. It revolutionized how I go about solving problems. If you are familiar with my Monitoring post, then this is the #2 in that post.
That is all for now. I am sure I will post more later.

More IMS Import Headaches

I got this error while trying to run an XML IMS import using the WebCT / Blackboard CE/Vista siapi.sh script…

A unit of type Institution cannot have a unit of type Campus as a child.

Guess being on vacation last week spaced my neurons. Normally, I ignore Blackboard errors as meaningless. This time I listened to the error and sent myself off in the wrong direction trying to figure out what was wrong with the typevalue element of the XML was wrong. Typevalue defines the level of the context. So while my level was set to 30 in the XML, for some reason it must be ignoring it, right?

Wrong! The problem was really that the relationship.sourcedid.source was set to “DBA IMS IMPORT USG_COL” instead of “DBA SIAPI Import USG_COL”. So it was unable to find the parent object. So the error makes no sense to me now.

Can I have my two hours back?

Granting a Student (Not All) Access

Problem:

The “Restrict access after this date” function affects all students and auditors without individual exceptions. In reality, universities need to grant individual exceptions for Incomplete grades, grade challenges, or other special cases.

Solution:

A Section Instructor could select all the students except the one who needs access and Deny access to them. Then the restrict flag could be lifted, allowing just this one student into the section.

Do students still see the section and get a difficult to understand error message when they are denied access?

Please verify the admin user and password

So, I got this error…

Unknown WebCT username or invalid password. Please verify the admin user and password in the IMS Settings.

I assumed the username and password were probably right. I had to find my error somewhere else.

The error turned out to be one missing character out of the 57 character long glcid. Totally my fault.

I wonder how long I would have spent dorking around with the password trying to get it work and thinking I must be typing something wrong.