Rote Loading

With this specific application, we can import data, but there are limitations due to its 2000-era handling of XML files.

  1. HTML forms uploading files have to…
    1. Have all the packets be received by the server.
    2. Process the file with the browser connection still open.
    3. The server has to tell the browser everything was received and is done.
  2. All this has to happen as one action within a 5 minute window.

A better method would allow just uploading the files to a page. Background processes would monitor that location and process the files independent of the browser. Notifications can be sent to alert the user the processing is done.

Or… recognize the echo XML file because you took too long and prevent it from being loaded or remove the data.

I ended up figuring out that if we split the files at about 5,000 records, then it should take about half the 5 minute window. I am pleased that for most I have seen that is true and about one in ten take so much longer that if I had cheated and gone with closer to the 5 minutes, I would be deleting duplicates. (This last is because some files are 50MB and others 100MB.)

The grumbling of this post is that I am on the 25th of 58 files. This is tedious. I am lamenting not creating a curl script to do this part for me. Automation is perfect for things like this.

 

Another Way to Verify Cookie Domain

Just finished a Oracle WebLogic Server 11g: Administration Essentials class today. So there are lots of things floating about in my head I want try. (Thankfully we have lots of development clusters for me to break beyond repair. Kidding. Sorta.)

One of the common support questions Blackboard asks for those of us CE/Vista clients running a cluster is whether we have changed the cookie domain in weblogic.xml. This has to do with specifying where the JSESSIONIDVISTA cookie is valid. By default the value in the weblogic.xml file is set to .webct.com which is not valid anywhere (not even Blackboard.com). One of the install steps is if one is running a cluster, in the administrator node Weblogic Domain directory run some commands to extract the weblogic.xml, edit it, then run some commands to add it back to the WAR file. Placing a “REFRESH” empty file on all the managed nodes deletes the staged and cached copies of the WAR.

No big deal and easy.

Except when it isn’t?

Occasionally someone will distrust your work and want you to verify the right setting is there. Normally they say to extract the weblogic.xml again and verify it is correct there. I had a thought. Why not verify in each managed node’s cache it has the correct value?

It is easier than it sounds. In the Weblogic domain directory (where setEnv.sh is located), change directories to

$WL_DOMAIN/servers/node_name/tmp/_WL_user/webct

(NOTE: Anything I put in bold means it is custom to you and not something I can anticipate what you would use there.)

Here I just used these greps to look for my domain. If I get results for the first one, then all is well. If I don’t get results for the first, then the second one should confirm the world is falling because we are missing the cookie domain.

grep “.domain.edu” */war/WEB-INF/weblogic.xml
grep “.webct.com” */war/WEB-INF/weblogic.xml

Since we use dsh for a lot of this kind of thing, I would use our regex for the node name and add on the path pieces in common. I have not yet studied the pieces between webct and war to know for certain who they are derived except to say they appear to 6 characters long and sufficiently random as to not repeat. Any [ejw]ar exploded into the cache appears to get a unique one. So this might work?

grep “.domain.edu” $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/??????/war/WEB-INF/weblogic.xml

If not, then try:

cd $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/
&& pwd && grep “.domain.edu” */war/WEB-INF/weblogic.xml

I’m envisioning this method to verify a number of different things in the nodes. It especially confirms the managed node received what I expected not that the admin node has the correct something.

Import Errors

A couple issues I encountered yesterday loading XML files on Blackboard Vista 8.

siapi output says:

error invoking method in adapter, message is: cvc-complex-type.2.3: Element ‘extension’ cannot have character [children], because the type’s content type is element-only.

Means the wrong type between luminis and ims was used. I created a files.properties file which sets the import type based on the name of the file. For the luminis type, the persons records are in persons1.xml. For the ims type, the persons records are in ims_users1.xml.


webct.log says:

Message is : Authorization denied

User trying to import the file must be an institution administrator. I probably created the import user but did not enroll it as an admin. What I get for moving too fast.


The following were added 26-MAY-2010…
webct.log says:

error invoking method in adapter, message is: Deployable component is not enabled

This one is actually accurate. The Luminis Message Broker settings need to be changed so Enabled equals true. Probably the other settings are also back to defaults. This happens after most Service Pack upgrades.


webct.log says:

The learning context represented by the lc_id and lc_source parameters is not within or equal to the learning context represented by the glcid parameter. This may cause problems if the learning contexts in the XML do not specify a parent or cannot be imported directly under the instituion

The XML file only contained 1 person and 1 member record. So this must be about the sourcedid.source and sourcedid.id of that file. The student was enrolled in the section, so I guess maybe the error can be ignored?

xmllint

This Linux tool is my new best friend. We get thousands of XML files from our clients for loading user, class, and enrollment information. Some of these clients customize our software or write their own software for generating the XML.

This means we frequently get oddities in the files which cause problems. Thankfully I am not the person who has to verify these files are good. I just get to answer the questions that person has about why a particular file failed to load.

The CE/Vista import process will stop if its validator finds invalid XML. Unfortunately, the error “An exception occurred while obtaining error messages.  See webct.log” doesn’t sound like invalid XML.

Usage is pretty simple:

xmllint –valid /path/to/file.xml | head

  1. If the file is valid, then the whole file is in the output.
  2. If there are warnings, then they precede the whole file.
  3. If there are errors, then only the errors are displayed.

I use head here because our files can be up to 15MB, so this prevents the whole file from going on the screen for the first two situations.

I discovered this in researching how to handle the first situation below. It came up again today. So this has been useful to catch errors in the client supplied files where the file failed to load.

1: parser error : XML declaration allowed only at the start of the document
 <?xml version=”1.0″ encoding=”UTF-8″?>

162: parser error : EntityRef: expecting ‘;’
<long>College of Engineering &amp&#059; CIS</long>

(Bolded the errors.) The number before the colon is the line number. The carat it uses to indicate where on the line an error occurred isn’t accurate, so I ignore it.

My hope is to get this integrated into our processes to validate these files before they are loaded and save ourselves headaches the next morning.

Useful User Agents

Rather than depend on end users to accurately report the browser used, I look for the user-agent in the web server logs. (Yes, I know it can be spoofed. Power users would be trying different things to resolve their own issues not coming to us.)

Followers of this blog may recall I changed the Weblogic config.xml to record user agents to the webserver.log.

One trick I use is the double quotes in awk to identify just the user agent. This information is then sorting by name to count (uniq -c) how many of each is present. Finally, I sort again by number with the largest at the top to see which are the most common.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | sort | uniq -c | sort -n -r

This is what I will use looking for a specific user. If I am looking at a wider range, such as the user age for hits on a page, then I probably will use the head command to look at the top 20.

A “feature” of this is getting the build (Firefox 3.011) rather than just the version (Firefox 3). For getting the version, I tend to use something more like this to count the found version out of the log.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | grep -c ‘<version>’

I have yet to see many CE/Vista URIs with the names of web browsers. So these are the most common versions one would likely find (what to grep – name – notes):

  1. MSIE # – Microsoft Internet Explorer – I’ve seen 5 through 8 in the last few months.
  2. Firefox # – Mozilla Firefox – I’ve seen 2 through 3.5. There is enough difference between 3 and 3.5 (also 2 and 2.5) I would count them separately.
  3. Safari – Apple/WebKit – In searching for this one, I would add to the search a ‘grep -v Chrome’ or to eliminate Google Chrome user agents.
  4. Chrome # – Google Chrome – Only versions 1 and 2.

Naturally there many, many others. It surprised me to see iPhone and Android on the list.

LC Oddities

IMS XML for Blackboard Vista 8:

Say Division1 exists. We want to create Group1 inside Division1. Ignore that Division1 already exists and write XML to create it again. Create Group1 with relationship tag info for Division1.

Starting with Group1 doesn’t work unless command-line overrides starting learning context to be Division instead of Group.

Luminis XML for Blackboard Vista 8:

Starting with Group1 fine because divisions are unsupported.

Don’t ever use Luminis XML as a model for IMS. Ever!

Better CE/Vista Web Server Log

Some support tickets are more easily solved by knowing both user behavior and environment. An often helpful piece of information is what web browser they used. To add this, shut down the cluster, edit /VISTA_HOME/config/config.xml to include the cs(User-Agent), and start the cluster. This line will need to appear for every node. At startup, the nodes will download a new copy of the file.

<elf-fields>date time time-taken c-ip x-weblogic.servlet.logging.ELFWebCTSession sc-status cs-method cs-uri-stem cs-uri-query bytes cs(User-Agent) x-weblogic.servlet.logging.E LFWebCTExtras</elf-fields>

Command:
cp config.xml config.xml.bak
sed -s s/bytes x-/bytes cs(User-Agent) x-/g config.xml.bak > config.xml

Probably this could be edited in the Weblogic 9.2 console. I haven’t looked yet.

Mail From Address

It appears CE/Vista has several locations for defining the email addresses it uses for SMTP.

  1. $WEBCTDOMAIN/config/config.xml:
    mail.from=
    From address for messages sent.
  2. $WEBCTDOMAIN/customconfig/startup.properties:
    WEBCT_ADMIN_EMAIL=
    Some internal errors have a mailto: prompt to contact the server administrator.
  3. $WEBCTDOMAIN/serverconfs/log4j.properties:
    log4j.appender.EMail.To=
    Report fatal errors.
  4. $WEBCTDOMAIN/serverconfs/log4jstartup.properties:
    log4j.appender.EMail.To=
    Report fatal errors.
  5. $WEBCTDOMAIN/webctInstalledServer.properties:
    WEBCT_ADMIN_EMAIL=
    Installer picks up this value for populating #2 and possibly #3 and #4.
  6. $WEBCTDOMAIN/webctInstalledServer.properties:
    MAIL_ORIGIN=
    Installer picks up this value for populating #1.

What really disturbs me is the Vista 8 installer created log4j properties files with the  SMTP server set up for miles.webct.com and sending from vista.monitor@webct.com? I cannot seem to find anything in the Vista 8 documentation or wiki or Google index about the “Vista Trap Notification” subject line, from address, or SMTP address which the log4j appender appears to be designed to send.

This Vista Trap Notification appears designed to send an email to the address any time a fatal error is encountered. That’s fine. Just use the smtp host and From address requested in the installer.

Don’t get me started about giving end users a mailto: prompt to report errors.

Rock Eagle Debrief

GeorgiaVIEW

  1. SMART (Section Migration Archive and Restore Tool) created for us by the Georgia Digital Innovation Group seemed well received. I’m glad. DIG worked tirelessly on it on an absurdly short schedule.
  2. Information is strewn about in too many places. There isn’t one place to go for information. Instead between Blackboard, VistaSWAT, and GeorgiaVIEW about 29. I amazed I do find information.
  3. Blackboard NG 9 is too tempting for some.
  4. Vista does DTD valdiation but not very well. We need to XML validation before our XML files are run. As we do not control the source of these files and errors by those creating the files cause problems, we run them in test before running in production. I am thinking of something along the lines of validating the file and finding the errors and reporting to the submitter the problems in the file. Also, it should do XML schema validation so we can ensure the data is as correct as possible before we load it.
Yaketystats
  1. If you run *nix servers, then you need Yaketystats. I have been using it for 2 years. It revolutionized how I go about solving problems. If you are familiar with my Monitoring post, then this is the #2 in that post.
That is all for now. I am sure I will post more later.

More IMS Import Headaches

I got this error while trying to run an XML IMS import using the WebCT / Blackboard CE/Vista siapi.sh script…

A unit of type Institution cannot have a unit of type Campus as a child.

Guess being on vacation last week spaced my neurons. Normally, I ignore Blackboard errors as meaningless. This time I listened to the error and sent myself off in the wrong direction trying to figure out what was wrong with the typevalue element of the XML was wrong. Typevalue defines the level of the context. So while my level was set to 30 in the XML, for some reason it must be ignoring it, right?

Wrong! The problem was really that the relationship.sourcedid.source was set to “DBA IMS IMPORT USG_COL” instead of “DBA SIAPI Import USG_COL”. So it was unable to find the parent object. So the error makes no sense to me now.

Can I have my two hours back?