xmllint

This Linux tool is my new best friend. We get thousands of XML files from our clients for loading user, class, and enrollment information. Some of these clients customize our software or write their own software for generating the XML.

This means we frequently get oddities in the files which cause problems. Thankfully I am not the person who has to verify these files are good. I just get to answer the questions that person has about why a particular file failed to load.

The CE/Vista import process will stop if its validator finds invalid XML. Unfortunately, the error “An exception occurred while obtaining error messages.  See webct.log” doesn’t sound like invalid XML.

Usage is pretty simple:

xmllint –valid /path/to/file.xml | head

  1. If the file is valid, then the whole file is in the output.
  2. If there are warnings, then they precede the whole file.
  3. If there are errors, then only the errors are displayed.

I use head here because our files can be up to 15MB, so this prevents the whole file from going on the screen for the first two situations.

I discovered this in researching how to handle the first situation below. It came up again today. So this has been useful to catch errors in the client supplied files where the file failed to load.

1: parser error : XML declaration allowed only at the start of the document
 <?xml version=”1.0″ encoding=”UTF-8″?>

162: parser error : EntityRef: expecting ‘;’
<long>College of Engineering &amp&#059; CIS</long>

(Bolded the errors.) The number before the colon is the line number. The carat it uses to indicate where on the line an error occurred isn’t accurate, so I ignore it.

My hope is to get this integrated into our processes to validate these files before they are loaded and save ourselves headaches the next morning.

Email Harvesters

Good Sign I missed the story about brothers convicted of harvesting emails the first time. Well, I noticed a followup.

Back around 2001, the CIO received complaints about performance for the web server. So, I went log trolling to see what the web server was doing. A single IP dominated the HTTP requests. This one IP passed various last names into the email directory. Some quick research revealed Apache could block requests from that IP. That calmed things down enough for me to identify the owner of the IP. The CIO then bullied the ISP to provide contact information for the company involved.

Previous little adventures like this landed me a permanent job, so I jumped at similar challenges.

Well, a few years later, it happened again. This time my boss had made me develop a script for the dissemination of the anti-virus software package to home users. Basically, it used email authentication for verification if someone could get the download link. So, I applied the same technique to the email directory. Well, this upset some people who legitimately needed email addresses. So the human workers would provide email addresses to people with a legitimate need.

I’m glad since I’ve left, VSU no longer looks up email addresses for people. (I thought some of the requests questionable.) Also, my little email authentication script was before LDAP was available to the university. I think the new solution much better.

One the more vocal complainers about my having stopped non-VSU access to the email directory was my current employer. We apparently list email addresses for employees freely. Which makes me wonder how much spam we get is due to the brothers described at the beginning of this story? Or other email harvesters? Just hitting the send button potentially exposes the email address.

No worries. I’m sure Glenn is protecting me. 🙂

Relative Truth

Found an interesting comment on an article the state of Georgia observing the Confederate Memorial Day….

The truth of history means very little to those who are dead set against learning anything from it. No matter what the history books used in our public school system say, most will never believe anything other than their own opinion about the Civil War. History revisionist are the celebs of the day. As long as people like Rev. Wright, and David Duke exist, history’s truth will be filtered through lies and distortions. Few observe Confederate Memorial Day: UGA to display original constitution; state offices closed

Truth may very well be completely relative. Back during the US Presidential election, I ran across an interesting article in the Washington Post discussing research John Bullock did about the effects of misinformation and idealogical bias ties. I used to think it had to do with a handful of people stuck in their green, second ammendment, pro-life, pro-choice, capitalist, regulation views. My favorite pasttime in college was assuming positions contrary to others even when I agree with the others.

I doubt the effect solely affects conservatives as was proposed in the article. More likely everyone has some blindspots in determing truth from myth or fiction kind of like optical illusions. (Yes, even myself.) We have to choose which information to believe any time we interact with information. Much of the rules in philosophy and science are built around combatting the biases we have.

Rather than force ideas on others, I think we should be teaching children from an early age to recognize when others and most especially themselves are operating under a bias. Its the only way to find detachment.

Session Oddities

One of the clients we host complained about losing their session. Blackboard recommended we switch how our load balancer is handling the session persistence. Before agreeing to do that, we decided to use Blackboard’s script to determine if there is a problem before trying to fix something which may or may not exist.

An acceptable number of sessions showing on multiple nodes of a cluster is less than 5%. When I ran the test, I found 35.8% matched this criteria. But wait just a second, this seemed like an extraordinarily high number. I ran a second test for an identically configured cluster on the same hardware to find only 4.3%. Why are these so different?

Most cases of this “duplicated session” I spot checked were 1 hit for autosignon on another node. Blackboard confirmed these happen before the user has logged in, so they could appear on the other node. So I ran the test again ignoring these autosignon requests and found we were down to 7.2%. Close to acceptable but not quite.

 Similar to autosignon, the editonpro.js appeared in the majority of the cases I spot checked as the sole hit another node. Once, I removed those from the test, I was down to 0.7%. My control cluster was down to 1.4%. 

One would hope the the script used to determine the amount of duplicate sessions would ignore or remove from the data set the known false positive log entries. 

One would also hope the script instructions (requires login to Blackboard help site) would help users account for these false positives. I did leave a comment on the instructions to hopefully help the next person who has to do this.

Blackboard iPhone App

People have been contacting me all day about the Blackboard iPhone App. Both Blackboard and the Chronicle of Higher Education posted blogs about its release.

I find it interesting Jessica mentioned a Georgia student is the inspiration in the Bb blog post. There are over 200,000 students in Georgia who cannot use this application because it relies on Blackboard Sync which only operates for Academic Suite (Classic) products. Blackboard says the Sync product isn’t available to the CE/Vista products used by all but a few schools in the University System of Georgia.

The odds are good the poor student who needs the app can’t use it.

Also, the USG is exactly the kind of client who Blackboard says should wait and see before migrating to Learn.

How Not To Break a Frame

Correct:

<script language=”Javascript” type=”text/javascript”>
if (top != self)
{
top.location = window.location;
}
</script>

Incorrect:

<script language=”Javascript” type=”text/javascript”>
if (top != self)
{
top.location = “/webct/urw/lc18361011.tp0/logonDisplay.dowebct”;
}
</script>

The problem with incorrect is the address used here is not the address in the location bar.  The one in the location bar has the values required to login. Instead I get something which causes users to be unable to login. Example: So we send someone to http://westga.view.usg.edu. They get redirected to another address in which we provide the glicid, insId, and insName. Correct breaks the frame and gives the browser back the same address. Incorrect breaks the frame and gives the browser back a different, non-functional address. Bad. Bad. Bad.

WebCT Vista 3 used the Correct JavaScript which just passes back the address used. Blackbord Vista 8 for some reason changes what worked to Incorrect.

Yay for first day of classes.
🙁

UPDATE 1:

It gets better… Bb Vista’s Custom Login and Institution List pages are unaffected (aka use the Vista 3 style JS). Only going to the generated logon page, loginDisplay.dowebct, has the issue.

Forcing Weblogic’s Config.xml

Let’s nevermind why I am working on this in the first place. Namely…

  1. the Blackboard Learning Environment Connector introduced using the hostname and port for applet URLs in Vista 8 Blackboard,
  2. Blackboard dropped WebCT’s support for using a different port for an application when behind a load balancer.
So we found out we could use port 443 as the SSL listen port because we terminate SSL on the load balancer, Weblogic would not bind to port 443, but the Vista application would be tricked into displaying to the end user what we wish.
In the past week, we have put the correct config.xml in place multiple times and found it reverts back to an older version with the port we don’t want. The first time, I was lazy and did not shut down the Weblogic admin server because… well… that was the lazy practice I had used in Weblogic 8.1 and had not had a problem. My shell record shows it was correct then. Within hours it wasn’t correct anymore.
So, we found a few things…
  1. a copy of the config.xml is stored WEBCTDOMAIN/servers/domain_bak/config_prev/,
  2. all files in WEBCTDOMAIN/config/ are pushed to the nodes,
  3. to change this value in the Weblogic console requires turning on a feature to bind to the SSL listen port.
Additionally, we think research into this would show Weblogic stores this information in memory. It will then write changes it makes to the file back to disk on the admin node (destroying our change). Managed nodes will then pick up the change.
The latest shot at this is to purge the #1 and #2 on both the admin server and managed nodes, put the right file in place on the admin nodes, and see if it reverts again.
So now I’ve got to write a script to periodically check if the nodes have the wrong listen port and email us should it change.

FBI Investigates Legal Activity Also

One of the reasons my photos sets are more full of flowers than buildings is people don’t call the FBI over pictures of flowers. While it is perfectly legal to take pictures of buildings from public spaces, it makes “victims” nervous. No one cares about flowers. I can take all the pictures I want without uncomfortable encounters.

Of course, unless my airline ticket is purchased by a government, I consistently get extra screening. It is a fact of life of neither looking African American, Native American, Caucasian, Asian, or Hispanic. Because look like an other, people put me in the extra screening list just in case.

A local student had to sit down with an FBI agent to “prove” he did not look Middle Eastern after photographing chicken rendering plants. Security of the plants called the local police who called the FBI. What would have happened to Jim if he had looked Middle Eastern? Would he have been arrested for doing something perfectly legal?

This is choice from the article:

Filson told Diffly that this is America and he should do what he wants, but when someone looks different in a post-Sept. 11, 2001 world, police may be called.

By the way, police officers arrest photographers who take pictures of them in the middle of an arrest.

Abuse?

EDIT: I almost forgot. A Georgia Tech student from Pakistan was detained for taking video of a building. This student also visited Pakistan and made statements which could easily sound threatening.

More Spiking CPU Over Assignments

More on concerns with editAssignmentSubmission.dowebct on Blackboard Vista nodes.

Found an error in the exceptions logs tied to one of the transactions:

Error occurred maintaining selective release status-Learning Object Id

It gets better…. The assignment in question? Not using selective release. Yeah. There is an error for an assessment. Hopefully it is just the assignments logic which causes hell on the nodes (database server seems unaffected). Well, the students are also affected as this seems to be the last thing they do in the session.

That selective release displays errors in the logic is quite awesome. I didn’t know the UI did this. However, to see it, I had to expand the right organizer page. Better usability would be to highlight all the errors to the designer. Heck, if there are errors, then it ought to be disabled. If the logic is known not to work, then why allow it to be active? You know… It might spin out of control. *headdesk*

User Interface v. SQL Reports and Tracking

Blackboard Vista tracks student activity. This tracking data is viewed as a critical feature of Vista. Our instructors depended on the information until we revoked their ability to run reports themselves due to performance issues. Campus administrators can still generate reports (though some still fail). We doubt the solution to this is Blackboard improving the queries to create the reports. We favor deleting tracking data (data preserved outside of Vista) to resolve the performance issues.

We developed SQL reports to look at the tracking data where the user in question was not a student. Yes, the data is limited, but in determining when and where a user was active, can help determine where to look in logs. When we hit the performance issues we started using these reports where the user interface reports failed to generate.

My understanding was the user interface and SQL reports on tracking were the same. Both looked at the same data. The user interface reports were just sexier wrapped in HTML and using icons. I compared a user interface report to a SQL report. Just prior to doing this, I was thinking, WebCT was stupid for not tracking when students look at the list of assessments. Turns out “Assessment list viewed” was tracked in the user interface all along but was missing in our sqlplus queries. WTF?

The data has to be there. The problem has to be our approach in sqlplus is inadvertently excluding the information from the reports. Because these reports must be accurate, I’ll crack this nut… Or become nuts myself.

CRACKED THE NUT: So, part of the data WebCT collected was the name of pages. There is a page name table which was inner joined to the user action table. So pages without a name were not reported. George suggested an outer join. I placed it on the page name table which now lets us see the formerly missing tracked actions. For the specific case where I found this, I now get all the missing actions.

Considering a Blackboard (it’s their problem now) feature request to ensure every page in the application has a title. I consider it developer laziness (someone else said worthlessness) that some pages might not have something so core and simple.

ANOTHER TRICK: Oracle’s NVL function displays a piece of text instead of a null value. Awesome for the above.