OrgCode Duplicate Filter

I was asked to work my “Unix magic”.

The problem? Duplicate courses were spooled and converted from the WebCT format to the Desire2Learn. The conversion process creates an import file using the WebCT SourcedId.Id as the OrgCode. The first time the OrgCode is used, it creates a course. The next and subsequent times, it duplicates content. So these duplicate converted courses gave us a situation where we were screwed.

Fortunately our partners at Desire2Learn intercepted the problem before it got worse.

Out of 1,505 still to be imported, there were 468 duplicates. Yes, 31% duplicates.

D2L asked me to filter the imports to remove the duplicates. I said I am too much of a n00b with Windows to pull it off.

The reply was to use Unix.

Boy do I love Bash shell scripting. In two hours I solved it, though after the high of solving something I had no idea how to write this morning in two hours, there must be something wrong with it.

First, my general idea was to read the file line by line and write those lines with OrgCodes that do not yet exist to a filtered.csv file. I started out looking to exactly duplicate my existing file in another file by reading it line by line.

A while loop which reads each line and records the whole line in a variable.

INPUTFILE=/path/to/file.txt
exec<$INPUTFILE
while read LINE
do
     stuff...
done

I quickly discovered though that since Windows uses the backslash, that foiled the ability of echo to exactly write every line to a file. The backslash escapes the next character. Neither double nor single quotes helped the situation. Oops. So I decided to use sed to make a temporary copy to duplicate the backslashes. A first backslash escapes the next character, in this case a second backslash.

sed -e 's|\\|\\\\|g'

As an error check, the last thing the script does is a diff -u to compare source and new files. At this stage nothing means perfect. I like the -u to give me easier to read results.

So I was able to get an exact copy of the original. All that was next was to get the OrgCode, check it against my filtered file, and if it did not exist, then add it to the end of the filtered file.

ORGCODE=`cat $LINE | awk -F, '{print $1}'`
IF_EXISTS=`grep $ORGCODE $FILTERFILE`
if [ -z IF_EXISTS ] ; then
     echo $LINE >>  $FILTERFILE
fi

Easy. Too easy?

The checks against my work confirmed it worked.

    1. A sorted version of the source run through this and compared in diff -u consistently showed the correct lines were excluded.
    2. Counts for the number of duplicates and the difference of lines missing works.
    3. A check for the number of duplicate OrgCodes returns nothing on the filtered file.

Back Door Restore

Humans make mistakes. Our clients’ administrators some times do very bad things without malicious intent. The “Deny Access” button is too close to the “Delete” one. About 160 student accounts were deleted.

The hypothesis came to me that sections keep data when a student is removed. Maybe it keeps the data when a student’s account is deleted. If I can trick the system into thinking the same student came back, then maybe it will relink the data. Everyone is happy.

To test this hypothesis, I…

  • Exported a copy of the grade book for my test student account in a test CE/Vista 8.0.6 system. Should the test go bad, then I could at least restore the grades.
  • Copied the account’s profile to a text file for the user name, sourcedid.source, and sourcedid.id.
  • Created a new account, gave it  the same user name, sourcedid.source, and sourcedid.id (and first, last, password).
  • Enrolled the account into the original class as a student.

The grades were missing. Clearly my hypothesis was wrong. Data is not kept around for deleted students like it for unenrolled students. Which sucks.

In my retest, I…

  • Unrolled the same account. The grade book showed the student’s data in red, meaning the account was unenrolled but the data still there.
  • Deleted the same account. The grade book still showed the student’s data in red.
  • Created a new account with a 2 in the user name and added it to the section. The grade book showed the new account not the one I deleted.

I hope this means I still saw the data post-delete because of the cache services. Changing the enrollment changed what was stored in the cache so the old account disappeared at that point. A couple more tries confirms the behavior of the student appearing in the grade book post-delete.

Still disconcerting deleted users appear in the grade book.