TED Talk: The riddle of experience vs. memory

We tend to think of memory the same as an audio-visual recording of the events in our life. Unfortunately, it is not. Memory captures snapshots which influence what we recall later. So a relatively good experience with a particularly bad ending can bias memory to recall the whole as bad.

If the below video does not display, then try Daniel Kahneman: The riddle of experience vs. memory.

Twexports

Data portability is good both for users and systems. But I like being able to export my data for another reason: search. Some times I want to build on an old conversation. It would be easier with an eidetic memory. Lacking that, knowing the terms I would have used, searching for it should yield that conversation. Except social media sites tend to suck at search. Twitter only goes so far back. Facebook searches contacts, pages, etc but not content like status updates. Even this WordPress site is far better at a term entered matching the same term that exists in the system.

Twitter intends to let us download a file with our tweets. I am excited because I can search it.

“We’re working on a tool to let users export all of their tweets,” Mr. Costolo said in a meeting with reporters and editors at The New York Times on Monday. “You’ll be able to download a file of them.”

Probably it will disappoint. The main disappointment will be that replies from others will not be present. So I will see where I address something to someone else, but not what they said to prompt the response or other’s followup. It will be like listening to someone have a conversation on a mobile phone where you get only half the conversation. At least, when I went to look at my earliest entries in Facebook’s archive file when it operated like Twitter, that was the disappointment I had.

P.S. What a bad title, right?
🙂

Content Migration Progress Tracking

When moving hundreds of thousands of courses between WebCT Vista and Desire2Learn, keeping track of what made it through which stage seems like an obvious hindsight thing to do. I added that last bit because we started to notice where things fell between the cracks starting to pile up. The basic process…

    1. Through Oracle SQL we populate a middleware database with those courses which meet acceptable criteria.
      CRACK: With the wrong criteria, courses are not even selected.
    2. Through Oracle SQL we generate XML listing in 50 count sets of the courses.
      CRACK: A subset of data loaded into the database may be extracted.
    3. A shell script automates running the WebCT command-line backup process to create a package for each course.
      CRACK: The command-line backup fails on some courses.
    4. Desire2Learn scripts pick up the files and convert WebCT formatted packages to Desire2Learn.
      CRACKS: Too big fail. Too long paths fail. This step can fail to create CSV files for the next step.
    5. Converted packages are placed in a queue to be imported into Desire2Learn.
      CRACKS: WebCT Vista courses can have 1,000 characters in the name and D2L fails if there are more than 50. Courses named the same as a previously loaded one but with a different file name loads both into the same course.

So, there are apparently five different stages and eight potential failures to track and no end-to-end tracking to even know what is missing. Which means inventing something to check logs for errors. 

First thing, I tried writing SQL to create an ordered list of the courses that are available.

The WebCT backups were a little tougher to convert into a useful list. The file names follow the format of Course_Parent_1234567890.bak. They were also in various directories, so I ended up doing something like this to get a list of the files, strip off the parents & time stamps, strip off the directories, and order it.

ls */*.bak | awk -F_ ‘{print $1}’ | awk -F\/ ‘{print $2}’ | sort

So I have two ordered lists. Anything in the D2L one and not in the WebCT one ought to be the work of clients making their own stuff. Anything in the WebCT one and not in the D2L one ought to be my missing ones. Only almost every line is a mismatch.

Visually comparing them, I realized the same courses had in effect different names. All the spaces were converted to underscores. So I ended up adding a sed to convert spaces to underscores after the sort.

Then I wrote some more scripts.

    • Go through the logs from #4 and #5 to display the errors. With it I was able to compare my list of missing with the errors and confirm why they did not come through.
    • Some of the cracks can be addressed by making new import files. So I wrote a script to add those lines to a redo.csv file. Touch up the problems and go.

Basically at this point it only covers 3-5. At some point I am going to have to check steps 1-5. 

New World

Here is an attempt at a positive post with our new vendor, Desire2Learn.

In the Old World, WebCT/Blackboard, my role was to develop install the application / databases, monitor for problems, automate systems to run without the need of humans, or to make it simple for humans to do.

In the New World, Desire2Learn, my role is to install database software, monitor for problems, automate systems to run without the need of humans, or to make it simple for humans to do. Desire2Learn installs the application. Temporarily I am doing content migration work.

In both worlds I do reverse engineering. An understanding of the principles behind the technology help me determine when and where are the problems. We can then hopefully prevent them from happening, detect the problems early, or solve them quickly.

A web server is a web server is a web server, right? Both Weblogic and IIS listen on ports 80 and 443. Files sitting in directories are either served to web browsers or execute code whose results are then served. Both Oracle and SQL Server have tables, views, indexes, and data files. Beyond the basic principles, though, things get hairy. The detail matter quite a bit.

Compiling Apache, PHP, and MySQL from source led to understanding the intimate details of how they worked. Automating Weblogic into silent installs and later working with cloning the install, led to understanding the intimate details of how it works. With IIS delivered to me and Desire2Learn installing the application, I feel very lost.

Throw in this is a very new operating system, database, programming languages, and scripting languages for me.

Basically I feel like I am reverse engineering blindfolded while building the boat I am using to cross the ocean and world is flat. The whole time I wonder when I will sail over the edge.

Mail Delivery Background Jobs

Only 8 years into running this product and I still learn something new about it.

Monday there was an event. Two nodes became responsive at about the same time. The other ten nodes did their jobs and transferred session information to the nodes taking on the sessions. Most were so busy they did not respond to monitor requests. There was lots of slowness. But we did not lose sessions. Nor did we lose the cluster.

Somehow we did lose the Mail tool. (Think internal email, but it can forward messages to email.)

In WebCT Vista 3 we diagnosed this by going to Weblogic, finding the email queues, and restarting some things to email would start flowing again. I was not able to find it that way. Apparently now, we go to the Background Jobs as a server administrator. The waiting mail jobs show up in Pending Jobs view.

Once I restarted the cluster, the blocking mail job was changed to Retried as soon as the JMS node came online. Retried only shows up in the All Jobs view. All the other views do not show it. Which makes sense because each view shows the status of the view name. So the Cancelled Jobs view only shows jobs with the Cancelled status. Any jobs with a Retried status should only show in the (non-existent) Retried Jobs and (existing) All Jobs views. It was bad assumption on my part that all potential statuses have a view.

Hindsight being 20/20, what we need is a Nagios monitor to detect is Pending jobs exceeds maybe 20-50 jobs. Normally this table appears empty. But I could see cases where it normally grows fast then quickly clears.

But then again, we have less than a year on this product. What are the odds this will happen again?

Goodreads Context Menu Search

At some point when I am not too lazy, I should empower my future laziness. I really need an easier way to look up books on Goodreads when I see them mentioned on other pages.

For example, say I go to BBC’s Big Read and want to add some to my wish list. At present I would highlight the title, Ctrl+C to copy, Alt+Tab to a window/tab with goodreads.com, click in the search box, Ctrl+V to paste, and hit Enter. Probably would take no more than a couple seconds.

What I want is to be able to highlight the title, right-click to bring up the context menu, and click on search for the title on Goodreads. Probably would take less than a second.

Interestingly enough with Google Chrome I can make the default search Goodreads. That achieves the above with the draw back that all searches go through it. So not a great solution as I do love being able to search from the address bar. But it’ll work for now as long as I remember to put it back.

Writing an extension appears to be the solution. Thus where my laziness is the barrier.
😀

OrgCode Duplicate Filter

I was asked to work my “Unix magic”.

The problem? Duplicate courses were spooled and converted from the WebCT format to the Desire2Learn. The conversion process creates an import file using the WebCT SourcedId.Id as the OrgCode. The first time the OrgCode is used, it creates a course. The next and subsequent times, it duplicates content. So these duplicate converted courses gave us a situation where we were screwed.

Fortunately our partners at Desire2Learn intercepted the problem before it got worse.

Out of 1,505 still to be imported, there were 468 duplicates. Yes, 31% duplicates.

D2L asked me to filter the imports to remove the duplicates. I said I am too much of a n00b with Windows to pull it off.

The reply was to use Unix.

Boy do I love Bash shell scripting. In two hours I solved it, though after the high of solving something I had no idea how to write this morning in two hours, there must be something wrong with it.

First, my general idea was to read the file line by line and write those lines with OrgCodes that do not yet exist to a filtered.csv file. I started out looking to exactly duplicate my existing file in another file by reading it line by line.

A while loop which reads each line and records the whole line in a variable.

INPUTFILE=/path/to/file.txt
exec<$INPUTFILE
while read LINE
do
     stuff...
done

I quickly discovered though that since Windows uses the backslash, that foiled the ability of echo to exactly write every line to a file. The backslash escapes the next character. Neither double nor single quotes helped the situation. Oops. So I decided to use sed to make a temporary copy to duplicate the backslashes. A first backslash escapes the next character, in this case a second backslash.

sed -e 's|\\|\\\\|g'

As an error check, the last thing the script does is a diff -u to compare source and new files. At this stage nothing means perfect. I like the -u to give me easier to read results.

So I was able to get an exact copy of the original. All that was next was to get the OrgCode, check it against my filtered file, and if it did not exist, then add it to the end of the filtered file.

ORGCODE=`cat $LINE | awk -F, '{print $1}'`
IF_EXISTS=`grep $ORGCODE $FILTERFILE`
if [ -z IF_EXISTS ] ; then
     echo $LINE >>  $FILTERFILE
fi

Easy. Too easy?

The checks against my work confirmed it worked.

    1. A sorted version of the source run through this and compared in diff -u consistently showed the correct lines were excluded.
    2. Counts for the number of duplicates and the difference of lines missing works.
    3. A check for the number of duplicate OrgCodes returns nothing on the filtered file.

Scrambls

Scrambls encrypts social media posts and lets users specify exactly who can see them, across all social media sites. The user can form groups from friends and family, going as broad as everyone with a Gmail account down to a specific colleague or even those who know a certain password. Everyone else (including the social media site itself) will only see a series of random numbers and symbols, keeping content private and secure.

From 7 steps to social media stardom.

The article points out a problem is the readers have to be users. So enough people in each user’s social network needs to join for it to become useful.

Far more serious potential problems…

    1. Should your encryption key(s) get deleted, corrupted, etc. Will you lose access to your posts?
    2. When this service disappears through bankruptcy, buy-out by AOL, or the FBI shuts down their servers because terrorists encrypt their posts, then will every user loses access to their own and friend’s posts?