Session Oddities

One of the clients we host complained about losing their session. Blackboard recommended we switch how our load balancer is handling the session persistence. Before agreeing to do that, we decided to use Blackboard’s script to determine if there is a problem before trying to fix something which may or may not exist.

An acceptable number of sessions showing on multiple nodes of a cluster is less than 5%. When I ran the test, I found 35.8% matched this criteria. But wait just a second, this seemed like an extraordinarily high number. I ran a second test for an identically configured cluster on the same hardware to find only 4.3%. Why are these so different?

Most cases of this “duplicated session” I spot checked were 1 hit for autosignon on another node. Blackboard confirmed these happen before the user has logged in, so they could appear on the other node. So I ran the test again ignoring these autosignon requests and found we were down to 7.2%. Close to acceptable but not quite.

 Similar to autosignon, the editonpro.js appeared in the majority of the cases I spot checked as the sole hit another node. Once, I removed those from the test, I was down to 0.7%. My control cluster was down to 1.4%. 

One would hope the the script used to determine the amount of duplicate sessions would ignore or remove from the data set the known false positive log entries. 

One would also hope the script instructions (requires login to Blackboard help site) would help users account for these false positives. I did leave a comment on the instructions to hopefully help the next person who has to do this.

Access to Multiple Systems

For the term starting Jan 7, we have students who need to be directed to the new Vista system (v8). By standard practice, students also have access to the previous term a few weeks into the new term, let’s say Feb 23. So we’ll need to ensure some access to the old Vista system (v3).

There are multiple ways we can handle this access:

  1. School VIPs – We highly encourage users bookmark, publish links, and access school VIPs instead of the actual address. The idea being these addresses will always go to the right place. Other addresses could change and not work. We even have a v8 version for pre-cutover access and a v3 version for post-cutover access. The problem seems to be some campuses and users continue to use the addesses other than the school VIPs (v8 will only show them the school VIP).
  2. Custom login page – We would place an HTML file on the v3 system explaining general access has moved. Some people would need to get past this page and into the v3 system. The questions here are:
    1. Can we just give the admins the link to bypass the custom login page? They could then manage who has access to the site. They might have to provide this “secret” to thousands of students.
    2. Do we dare publish the link on the page? Something like “To check Fall 2008 grades: click here.”
  3. Deny access – We would deny access to all users except those who hold the Institution Administrator role in v3. The holders of that role would then be responsible to granting access one-by-one to other users who need to access to this old system.
  4. f5 iRule to 302 Redirect – We do host a school who uses autosignon. It is conceivable we could intercept attempts to login and redirect them to the correct host. It would be much better for them just to use the School VIPs solution.

Anyone have a better solution?

Recap of Vista Stuff

It has been a hectic week. A recap…

Java certificate fix – Yesterday, August 23rd, the certificate distributed in various Java applets expired. The community discovered the issue and informed Blackboard who put out a fix for the more current products on August 15th. Many customers are leery of having such little lead time to test, verify, and install a fix. Well, Vista 3.0.7.17 was also reported to have the problem, but Blackboard didn’t provide a fix until the 20th after I got my TSM to verify it really still is a problem on the 18th. (The corrected 3.0.7.17.8 version was provided August 21st. Why is in the next paragraph.)

The fix for Vista 3 required us to be on 3.0.7.17.8 (hotfix 8 which we had not yet applied), had references to the “webctapp” directory (in Vista 3 it is applications), and distributed a webct.sh script to add updateWar which didn’t work with Vista 3. FAIL. Thankfully we have modified War files in the past, so adding the updates was more work and accomplished before Blackboard provided a corrected version.

To see the Java certificates in Windows: Control Panel > Java > Security > Certificates. The Blackboard ones are verified by Thawte (the Certificate Authority). The old one is issued to Blackboard. The new one is issued to dc.blackboard.com.

Vista 3.0.7.17.8 – This hotfix was released a couple weeks ago. However, since the priority has been the migration to Vista 8, this was on hold. The previous problem made us step up and throw this into production. The testers went to heroic efforts to get this and the certifcate fix tested. Testing was mixed.

  1. Losing session cookie because of Office 2007 in Internet Explorer. Happened less often post fix, but still happens in some cases.
  2. Autosignon MAC2. Mode to allow insecure MAC works to give the one school using it time to correct update their portal to use MAC2. Originally the plan was to let them work out MAC2 in test.

Slammed by our users…

  1. systemIntegrationApi.dowebct – The school using the autosignon wanted to have the correct consortiaId to create the MAC. Some time back in January they started calling this any time users tried to login because a handful (guess was ~12) have had their username changed. So the autosignon failed. Yes, they were sent us 25,000 requests in a busy day (about 20% of the queues were working on these during the day) to handle potential 12 problems in a term. FAIL.
  2. pmSelfRegister.dowebt – One of the clusters started to have issues. Two nodes went crappy. I looked at the Weblogic console and found all of the failing nodes had no free spots in the queues. 90% of the queues were working on these. Much of this is because the requests were hanging around for at least 4800 seconds (an hour is 3600 seconds). At about 6000 seconds the cluster recovered when the queues cleared.I think the queues cleared because I changed to false a couple settings:
    • Allow users to register themselves as a Student in a section = false
    • Allow users to register themselves as an Auditor in a section = false

    As I recall, we only had about 22 queue spots open (out of 308) across the whole cluster. We got lucky.