WebCT

You are currently browsing articles tagged WebCT.

The question of why we run ten clusters came up recently. Off the top of my head, the answer was okay. Here is my more thoughtful response.

Whenever I have been in a conversation with a BEA (more recently Oracle) person on Weblogic, the number of nodes we run has invariably surprised them. Major banks serve ten times the number simultaneous users we have on a half dozen managed nodes or less. We have 130 managed nodes for production. Overkill?

There are some advantages they have.

  1. Better control over the application. WebCT hacked together an install process very much counter to the way BEA would have done it. BEA would have had one install the database, the web servers, and then deploy the application using either the console or command-line. WebCT created an installer which does all this in the background out of sight and mind of the administrator. They also created start and stop scripts which do command-line interaction of Weblogic to start the application. Great for automation and making it simple for administrators. It also lobotomies the console making many advanced things one could normally do risky. So now the console is only useful for some minor configuration management and monitoring.
  2. Better control over the code. When there is a performance issue, they can find what is the cause and improve the efficiency of the code. The best I can do is point out the inefficiencies to a company who chose as a priority a completely different codebase. If you do not have control over the code, then you give the code more resources.
  3. As good as Weblogic is at juggling multiple managed nodes, more nodes does not always equal better. Every node has to keep track of the others. The heart beats communicate through multicast. Every node sends out its own and listens for the same from all the others. Around twenty nodes they would miss occasional beats on their own. Thrown in a heavy work load and an overwhelmed node can miss enough missed beats it becomes marked as unavailable by the others. Usually at this point is when the monitors started paging me about strange values in the diagnostics. Reducing the number of nodes helped.

More resources means more nodes. We had two clusters with about 22 nodes (44 total) each when we hit a major performance wall. They were split into four clusters with 15 nodes each (60 total). Eventually these grew to over 22 nodes each again. At this point upgrading was out of the question. A complete overhaul with all new databases and web servers meant we could do whatever we wished.

The ideal plan was a cluster per client. Licenses being so expensive scrapped that plan.

Ten clusters with 13 managed nodes each was a reasonable compromise. More nodes while also using smaller clusters achieved both needs well. Empty databases also gave us a better restarting point. The databases still have grown to the point certain transactions run slowly just for 4 terms later. (I was hoping for 6.) Surviving the next two years will be a challenge to say the least. I wish we got bonuses for averting disasters.

My web hosting service, Dreamhost, happens to have a one-click-installer for Moodle. So I installed one for my own personal sandbox. In looking at the available roles, it suddenly occurred to me…. Comparing any LMS to another is like comparing an apple to an orange. The industry is like the Tower of Babel. Each product has its own jargon covering much of the same ground in absurdly different ways. How could you have an Internet if all the servers talked to each other so differently? Yet in technology created for higher education every system has a different name for the person who teaches a class or even if the name is the same, the capabilities differ.

Sure, there are some commonalities, even apples and oranges are both fruit, but the developers had different conceptual models in mind. The same word meaning different things really is quite annoying. Another example is a course in Vista is a container for the type of place where learning takes place whereas in Learn a course is where teaching takes place. Teaching in Vista takes place in sections. So… Vista : section :: Learn : course.

These different conceptual models are why the faculty get so irate about change. It is hard enough to have to learn new places to click and how to accomplish what you used to accomplish. For some period of time they have to have two vocabularies and maybe even years later they still cannot call it the correct term. (WebCT CE/SE called where teaching takes place courses and both former and current coworkers 6 years later still call sections “courses”).

One would think standards organizations like the IMS Global Learning Consortium would help solve this. Every product adhering to a standard should end up adopting consistent terminology, conceiving of objects similarly, and conceiving of processes similarly. This make comparing the two easier. Except the standard adoptions appear to be in the integration components or database not the main product.

I really feel bad for the instructional technologist who has to support more than one learning management system.

Also, selecting a new LMS seems like an insanely difficult task when trying to learn a dozen vocabularies enough to ascertain whether it has what you need.

Just finished a Oracle WebLogic Server 11g: Administration Essentials class today. So there are lots of things floating about in my head I want try. (Thankfully we have lots of development clusters for me to break beyond repair. Kidding. Sorta.)

One of the common support questions Blackboard asks for those of us CE/Vista clients running a cluster is whether we have changed the cookie domain in weblogic.xml. This has to do with specifying where the JSESSIONIDVISTA cookie is valid. By default the value in the weblogic.xml file is set to .webct.com which is not valid anywhere (not even Blackboard.com). One of the install steps is if one is running a cluster, in the administrator node Weblogic Domain directory run some commands to extract the weblogic.xml, edit it, then run some commands to add it back to the WAR file. Placing a “REFRESH” empty file on all the managed nodes deletes the staged and cached copies of the WAR.

No big deal and easy.

Except when it isn’t?

Occasionally someone will distrust your work and want you to verify the right setting is there. Normally they say to extract the weblogic.xml again and verify it is correct there. I had a thought. Why not verify in each managed node’s cache it has the correct value?

It is easier than it sounds. In the Weblogic domain directory (where setEnv.sh is located), change directories to

$WL_DOMAIN/servers/node_name/tmp/_WL_user/webct

(NOTE: Anything I put in bold means it is custom to you and not something I can anticipate what you would use there.)

Here I just used these greps to look for my domain. If I get results for the first one, then all is well. If I don’t get results for the first, then the second one should confirm the world is falling because we are missing the cookie domain.

grep “.domain.edu” */war/WEB-INF/weblogic.xml
grep “.webct.com” */war/WEB-INF/weblogic.xml

Since we use dsh for a lot of this kind of thing, I would use our regex for the node name and add on the path pieces in common. I have not yet studied the pieces between webct and war to know for certain who they are derived except to say they appear to 6 characters long and sufficiently random as to not repeat. Any [ejw]ar exploded into the cache appears to get a unique one. So this might work?

grep “.domain.edu” $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/??????/war/WEB-INF/weblogic.xml

If not, then try:

cd $WL_DOMAIN/servers/node_name_regex/tmp/_WL_user/webct/
&& pwd && grep “.domain.edu” */war/WEB-INF/weblogic.xml

I’m envisioning this method to verify a number of different things in the nodes. It especially confirms the managed node received what I expected not that the admin node has the correct something.

This Linux tool is my new best friend. We get thousands of XML files from our clients for loading user, class, and enrollment information. Some of these clients customize our software or write their own software for generating the XML.

This means we frequently get oddities in the files which cause problems. Thankfully I am not the person who has to verify these files are good. I just get to answer the questions that person has about why a particular file failed to load.

The CE/Vista import process will stop if its validator finds invalid XML. Unfortunately, the error “An exception occurred while obtaining error messages.  See webct.log” doesn’t sound like invalid XML.

Usage is pretty simple:

xmllint –valid /path/to/file.xml | head

  1. If the file is valid, then the whole file is in the output.
  2. If there are warnings, then they precede the whole file.
  3. If there are errors, then only the errors are displayed.

I use head here because our files can be up to 15MB, so this prevents the whole file from going on the screen for the first two situations.

I discovered this in researching how to handle the first situation below. It came up again today. So this has been useful to catch errors in the client supplied files where the file failed to load.

1: parser error : XML declaration allowed only at the start of the document
 <?xml version=”1.0″ encoding=”UTF-8″?>

162: parser error : EntityRef: expecting ‘;’
<long>College of Engineering &amp&#059; CIS</long>

(Bolded the errors.) The number before the colon is the line number. The carat it uses to indicate where on the line an error occurred isn’t accurate, so I ignore it.

My hope is to get this integrated into our processes to validate these files before they are loaded and save ourselves headaches the next morning.

I noticed one the nodes in a development cluster was down. So I started it again. The second start failed, so I ended up looking at logs to figure out why. The error in the WebCTServer.000000000.log said:

weblogic.diagnostics.lifecycle.DiagnosticComponentLifecycleException: weblogic.store.PersistentStoreException: java.io.IOException: [Store:280036]Missing the file store file “WLS_DIAGNOSTICS000001.DAT” in the directory “$VISTAHOME/./servers/$NODENAME/data/store/diagnostics”

So I looked to see if the file was there. It wasn’t.

I tried touching a file at the right location and starting it. Another failed start with a new error:

There was an error while reading from the log file.

So I tried copying to WLS_DIAGNOSTICS000002.DAT to WLS_DIAGNOSTICS000001.DAT and starting again. This got me a successful startup. Examination of the WLS files revealed the the 0 and 1 files have updated time stamps while the 2 file hasn’t changed since the first occurance of the error.

That suggests to me Weblogic is unaware of the 2 file and only aware of the 0 and 1 files. Weird.

At least I tricked the software into running again.

Some interesting discussion about these files.

  1. Apparently I could have just renamed the files. CONFIRMED
  2. The files capture JDBC diagnostic data. Maybe I need to look at the JDBC pool settings. DONE (See comment below)
  3. Apparently these files grow and add a new file when it reaches 2GB. Sounds to me like we should purge these files like we do logs. CONFIRMED
  4. There was a bug in a similar version causing these to be on by default.

Guess that gives me some work for tomorrow.
:(

If Blackboard opens up the schema for Vista 8, then maybe I’ll feel more comfortable sharing the reporting SQL I use. Ron Santos has a good PowerPoint on SQL for reports at Simon Frasier University.

Every time a Vista 3 node is shut down without going through the initiated shut down process, there is a chance of incorrect data written to the tracking files (in NodeA/tracking/). Normally it leaves strange characters or partial lines at the end of the file. This is the first time I have seen it write the contents of another log instead of the tracking data.

click – 1.0 – 1244228052889 – 1135588340001 – “nova.view.usg.edu-1244227762853-6288″ – SSTU – discussion – “compiled-message-viewed” – “page name” – 558711383 -

click – 1.0 – 1244228052891 – 15.0; .NET CLR 1.1.4322)”

2009-04-23      20:58:35        0.0030  xxx.xxx.xxx.xxx    JxH1zg4fZT1LTGcpmyNW    200     GET     /webct/libraryjs.dowebct        locale=en_US    0       “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)”

Even better. The node went down on June 5th at around 3pm. The lines from the other log were from April 23rd at 8:58pm.

Why am I surprised to see new incorrect behavior? Especially when the node was really confused?

Do you run one of these versions of the former WebCT products?

  • CE4.x
  • CE6.x
  • CE8.x
  • Vista 3.x
  • Vista 4.x
  • Vista 8.x

If so, then you should join us for the next Vista SWAT web conference call Thursday, May 14th (and every other Thursday). We help each other solve issues and better understand how to use / run the product.

To be added to the Vista SWAT e-mail list, please e-mail jeff.longland who uses the uwo.ca domain. He graciously sends out the reminders.

I’m sure the Blackboard acquisition of ANGEL will get discussed.
:)  

Some former WebCT (bought by Blackboard) customers switched to ANGEL rather than move to Blackboard products. PDF Apr 14, 2009 Today, Blackboard announced it is buying ANGEL. You can run, but you cannot hide from Blackboard.

Some light reading for you…

  1. Learning, Together ANGEL Learning and Blackboard® have decided to join forces.
  2. Blackboard Plans to Buy Another Rival, Angel Learning | Chronicle.com
  3. Why HigherEd is rejecting Blackboard … | Laura Gekeler
  4. Open Thread on Blackboard/ANGEL Merger | mfeldstein.com

So the options left are…

  1. Blackboard-WebCT-ANGEL
  2. Moodle
  3. Desire2Learn (currently in patent troubles with Bb)
  4. Pearson eCollege
  5. Sakai

One of the clients we host complained about losing their session. Blackboard recommended we switch how our load balancer is handling the session persistence. Before agreeing to do that, we decided to use Blackboard’s script to determine if there is a problem before trying to fix something which may or may not exist.

An acceptable number of sessions showing on multiple nodes of a cluster is less than 5%. When I ran the test, I found 35.8% matched this criteria. But wait just a second, this seemed like an extraordinarily high number. I ran a second test for an identically configured cluster on the same hardware to find only 4.3%. Why are these so different?

Most cases of this “duplicated session” I spot checked were 1 hit for autosignon on another node. Blackboard confirmed these happen before the user has logged in, so they could appear on the other node. So I ran the test again ignoring these autosignon requests and found we were down to 7.2%. Close to acceptable but not quite.

 Similar to autosignon, the editonpro.js appeared in the majority of the cases I spot checked as the sole hit another node. Once, I removed those from the test, I was down to 0.7%. My control cluster was down to 1.4%. 

One would hope the the script used to determine the amount of duplicate sessions would ignore or remove from the data set the known false positive log entries. 

One would also hope the script instructions (requires login to Blackboard help site) would help users account for these false positives. I did leave a comment on the instructions to hopefully help the next person who has to do this.

« Older entries