tracking

You are currently browsing articles tagged tracking.

CE/Vista Reports and Tracking displays summaries of activity. If an instructor seeks to know who clicked on a specific file, then Reports and Tracking falls down on the job.

Course Instructor can produce a report of the raw tracking data. However, access to the role falls under the Administration tab so people running the system need to make a user specifically to enroll themselves at the course level to get the reports. (Annoying.)

Instead the administrators for my campuses pass up to my level of support requests to generate reports. For providing these I have SQL to produce a report. This example is for users who clicked on a specific file. Anything in bold is what the SQL composer will need to alter.

set lines 200 pages 9999
col user format a20
col action format a32
col pagename format a80

clear breaks computes
break on User skip 1
compute count of Action on User

select tp.user_name "User",ta.name "Action",
      to_char(tua.event_time,'MM/DD/RR HH24:MI:SS') "Time",
      NVL(tpg.name,'--') "PageName"
  from trk_person tp, trk_action ta, trk_user_action tua,
      trk_page tpg, learning_context lc
  where tp.id = tua.trk_person_id
    and ta.id = tua.trk_action_id
    and tua.trk_page_id = tpg.id (+)
    and tua.trk_learning_context_id = lc.id
    and lc.id = 1234567890
    and tpg.name like '%filename.doc%'
  order by tp.user_name,tua.event_time
/

Output

  • User aka tp.user_name – This is the student’s account.
  • Action aka ta.name – This is an artifact of the original script. You might drop it as meaningless from this report.
  • Time aka tua.event_time – Day and time the action took place.
  • PageName aka tpg.name – Confirmation of the file name. Keep if using like in a select on this.

Considerations

I use the learning context id (lc.id aka learning_context.id) because in my multi-institution environment, the same name of a section could be used in many places. This id ensures I data from multiple sections.

The tricky part is identifying the file name. HTML files generally will show up as the name of in the title tag (hope the instructor never updates it). Office documents generally will show as the file name. Here are a couple approaches to determining how to use tpg.name (aka trk_page.name).

  1. Look at the file in the user interface.
  2. Run the report without limiting results to any tpg.name. Identify out of the results the name you wish to search and use: tpg.name = ‘page name

Most tracked actions do have a page name. However, some actions do not. This SQL is designed to print a “–” in those cases.

Every time a Vista 3 node is shut down without going through the initiated shut down process, there is a chance of incorrect data written to the tracking files (in NodeA/tracking/). Normally it leaves strange characters or partial lines at the end of the file. This is the first time I have seen it write the contents of another log instead of the tracking data.

click – 1.0 – 1244228052889 – 1135588340001 – “nova.view.usg.edu-1244227762853-6288″ – SSTU – discussion – “compiled-message-viewed” – “page name” – 558711383 -

click – 1.0 – 1244228052891 – 15.0; .NET CLR 1.1.4322)”

2009-04-23      20:58:35        0.0030  xxx.xxx.xxx.xxx    JxH1zg4fZT1LTGcpmyNW    200     GET     /webct/libraryjs.dowebct        locale=en_US    0       “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)”

Even better. The node went down on June 5th at around 3pm. The lines from the other log were from April 23rd at 8:58pm.

Why am I surprised to see new incorrect behavior? Especially when the node was really confused?

From 2001 to 2006, Microsoft Outlook was the email client I used for work (and on my home computer to access work stuff). Back then, Exchange was not available, so a number of the features were more hacks than reality. However, it worked pretty well.

When I changed jobs, Netscape and Thunderbird were the pre-installed clients. I opted for Thunderbird. It worked pretty well for me. Calendaring was in MeetingMaker. Everything worked pretty well.

Recently work shifted to Exchange, so going back to Outlook made sense. Maybe because I have so much experience, the transition was not as bad as it might have been. Still… These are gotchas which have annoyed me lately:

  1. Editable subject usability: The emails from our client issue tracking system put the description where its hidden. I was really pissed that I could not edit the subject until I figured out unlike most software which changes the shading to show it is now editable, Outlook just lets me edit at any time. Also, editing the subject after it is used by something else like a task results in the change in the email but not the task. (The main reason I want to change them is so it appears correctly in the task list. ) Copying to a second email results in the same problem. Apparently I have to either create a new task and copy-n-paste the subject I want or forward the email to myself.
  2. Spacebar moves to next message instead of next new message: I really like the Thunderbird method of skipping to the next unread message when I hit the spacebar at the end of the current message. It even will find the next unread message in another folder. Outlook just advances to the next message.
  3. Boolean is more than OR: I had this fantastic Thunderbird filter which looked for user@ AND domain.tld. Outlook only honors OR. We have 15 admin nodes and databases which send up reports. Alerts and tickets come from a different source and unaffected by this.
  4. Search ignores special characters: I thought in the past I had sent email to abc-defghi@domain.tld. However, the message bounced, so I searched my email for part of the address “abc-defghi” as its not in the address book. I got results which match “abc” not “abc-defghi”. So it ignored the hyphen and everything after. FAIL!
  5. Send email as plain text or paste a plain text: Yes, I know lots of people have HTML capable clients. I hate Outlook puts my replies in a sickly blue font. When I copy and paste from the elsewhere in the message, it changes the font. So then I have to go and do formatting to have a presentable email. I just want to type and send. I don’t care about fonts, colors, etc. If I did, then I would create a web page. … (Added 2009-JUN-03)

That’s it for now.

On the WebCT Users email list (hosted by Blackboard) there is a discussion about a mysterious directory called unmarshall which suddenly appeared. We found it under similar circumstances as others by investigating why a node consumed so much disk space. Failed command-line restores end up in this unmarshall directory.

Unmarshalling in Java jargon means:

converting the byte-stream back to its original data or object 1

This suspiciously sounds like what a decryption process would use to convert a .bak file into a .zip so something can open the file.

This is fourth undocumented work space where failed files site for a while and cause problems and no forewarning from the vendor.

Previous ones are:

  1. Failed UI backups end up in the weblogic81 (Vista 3, does this still happen in Vista 8?) directory.
  2. Failed tracking data files end up in WEBCTDOMAIN/tracking (Vista 3, apparently no longer stored this way in Vista 4/8 according to CSU-Chico and Notre Dame)
  3. Web Services content ends up in /var/tmp/ and are named Axis####axis. These are caused by a bug in DIME (like MIME) for Apache Axis. No one is complaining about the content failing to arrive, so we presume the files just end up on the system.

#3 were the hardest to diagnose because of a lack of an ability to tie the data back to user activity.

Is this all there are? I need to do testing to see which of these I can cross off my list goring forward in Vista 8. Failed restores are on it indefinitely for now.
:(

References:

  1. http://www.jguru.com/faq/view.jsp?EID=560072

First Metricocracy measured hits. Pictures and other junk on pages inflated the results so Metricocracy decided on either unique visitors or page views. Now, the Metricocracy wants us to measure attention. Attention is engagement, how much time users spend on a page.

What do we really want to know? Really it is the potential value of the property. The assumption around attention is the longer someone spends on a web site, the more money that site gains in advertisement revenue. The rationale being users who barely glance at pages and spend little time on the site are not going to click ads. Does this really mean users who linger and spend large amounts of time on the site are going to click more ads?

This means to me attention is just another contrived metric which doesn’t measure what is really sought. I guess advertisement companies and the hosts brandishing them really do not want to report the click through rates?

My web browsing habits skew the attention metric way higher than it ought to be. First, I have a tendency to open several items in a window and leave them lingering. While my eyes spent a minute looking the content, the page spent minutes to hours in a window… waiting for the opportunity. Second, I actively block images from advertisement sources and block Flash except when required.

As a DBA, page views also has debatable usefulness. On the one hand we could use it because it represents a count of objects requiring calls to the database and rendering by application and web server code. Hits represent all requests for all content, simple or complex, so is more inclusive. Bandwidth throughput represents how much data is sucked out or pushed into the systems.

We DBAs also provide supporting information to the project leaders. Currently they look at the number of users or classrooms who have been active throughout the term. Attention could provide another perspective to enhance the overall picture of how much use our systems get.

Cat Finnegan, who conducts research with GeorgiaVIEW tracking data, measures learning effectiveness. To me, that is the ultimate point of this project. If students are learning with the system, then it is successful. If we can change how we do things to help them learn better, then we ought to make that change. If another product can help students learn better, then that is the system we ought to use.

Ultimately, I don’t think there is a single useful metric. Hits, unique users, page views, attention, bandwith, active users, etc., all provide a nuanced view of what is happening. I’ve used them all for different purposes.

Clusters can making finding where a user was working a clusterf***. Users end up on a node, but they don’t know which node. Heck, we are ahead of the curve to get user name, date, and time. Usually checking all the nodes in the past few days can net you the sessions. Capturing the session ids in the web server logs usually leads to finding an error in the webct logs. Though not always. Digging through the web server logs to find where the user was doing something similar to the appropriate activity consumes days.

Blackboard Vista captures node information for where the action took place. Reports against the tracking data provide more concise, more easily understood, and more quickly compiled. They are fantastic for getting a clear understanding of what steps a user took.

Web server logs contain every hit which includes every page view (well, almost, the gap is another post). Tracking data represents at best 25% of the page views. This problem is perhaps the only reason I favor logs over tracking data. More cryptic data usually means a slower resolution time not faster.

Another issue with tracking is the scope. When profiling student behavior, it is great. The problem is only okay data can be located for instructors while designers and administrators are almost totally under the radar. With the new outer join, what we can get for these oblivious roles has been greatly expanded.

Certainly, I try not to rely too much on a single source of data. Even I sometimes forget to do so.

Blackboard Vista tracks student activity. This tracking data is viewed as a critical feature of Vista. Our instructors depended on the information until we revoked their ability to run reports themselves due to performance issues. Campus administrators can still generate reports (though some still fail). We doubt the solution to this is Blackboard improving the queries to create the reports. We favor deleting tracking data (data preserved outside of Vista) to resolve the performance issues.

We developed SQL reports to look at the tracking data where the user in question was not a student. Yes, the data is limited, but in determining when and where a user was active, can help determine where to look in logs. When we hit the performance issues we started using these reports where the user interface reports failed to generate.

My understanding was the user interface and SQL reports on tracking were the same. Both looked at the same data. The user interface reports were just sexier wrapped in HTML and using icons. I compared a user interface report to a SQL report. Just prior to doing this, I was thinking, WebCT was stupid for not tracking when students look at the list of assessments. Turns out “Assessment list viewed” was tracked in the user interface all along but was missing in our sqlplus queries. WTF?

The data has to be there. The problem has to be our approach in sqlplus is inadvertently excluding the information from the reports. Because these reports must be accurate, I’ll crack this nut… Or become nuts myself.

CRACKED THE NUT: So, part of the data WebCT collected was the name of pages. There is a page name table which was inner joined to the user action table. So pages without a name were not reported. George suggested an outer join. I placed it on the page name table which now lets us see the formerly missing tracked actions. For the specific case where I found this, I now get all the missing actions.

Considering a Blackboard (it’s their problem now) feature request to ensure every page in the application has a title. I consider it developer laziness (someone else said worthlessness) that some pages might not have something so core and simple.

ANOTHER TRICK: Oracle’s NVL function displays a piece of text instead of a null value. Awesome for the above.

Better Way to Count

Our awesome sysadmins have put the user agent into our AWStats so we are tracking these numbers now. They discovered something I overlooked. Netscape 4.x is 10 times more used than 7.x or 8.x. Wowsers! Some people really do not give up on the past.

Back in the Netscape is dead post, I used this to count the Netscape 7 hits.

grep Netscape/7 webserver.log* | wc -l

Stupid! Stupid! Stupid! The above requires running for each version of Netscape. This is why I missed Netscape 4.

This is more convoluted, but I think it its a much better approach.

grep Netscape webserver.log* | awk -F\t ‘{print $11}’ | sort | uniq -c | sort -n

It looks uglier, but its much more elegant. Maybe I ought to make a resolution for 2008 to be elegant in all my shell commands.

This version first pulls any entries with Netscape in the line. Next, the awk piece reports only the user agent string. The first sort puts all the similar entries next to each other so the uniq will not accidentally duplicate. The -c in the uniq counts. The final sort with the -n orders them by the uniq’s count. The largest will end up at the bottom.

I am blogging from the pre-conference GeorgiaVIEW meeting @ Rock Eagle yesterday afternoon and this morning. I enjoy connecting with people around the state of Georgia who use our Vista system. Most of them do not make it to BbWorld. Some hot topics:

  • Alternatives to Blackboard Vista
  • Training
    • Content repository
  • Returning Reports and Tracking to instructors.
    • Some reports still failing. One approach may be to remove tracking data from Vista database and make it available elsewhere.
  • Upgrade to Vista 4. People want a timeline, access to a training instance ASAP, please not do an in-place upgrade.
    • Limited shelf life on internals of Vista 3 / 4.0 – 4.1.2
    • More of customers have moved or are moving to Vista 4 / CE 6 than a year ago.
    • Can take advantage of new tools available in Vista 4.
    • Data retention – policy, reponsibilities (faculty, campus, OIIT)
    • Phased approach – parallel environments, at some point Vista 3 goes away and no longer available.
    • End of Fall 2008 or Spring 2009.
  • People are both quite happy we are going to Vista 4 and disconcerted at the prospect of having to move to Vista 4 in even over a year from now (at the worst by April 2009).
    • Export / import of non-SIS created users.
    • Training

Lovely (yeah a real person and she is) says Lovely Freelove would be one of the best names ever.

Much of what I might write in these posts about Vista is knowledge accumulated from the efforts of my coworkers.

I’ve decided to do a series of blog posts on our presentation at BbWorld ‘07, on the behalf of the Georgia VIEW project, Maintaining Large Vista Installations (2MB PPT). I wrote the bit about tracking files a while back in large part because of the blank looks we got when I mentioned in our presentation at BbWorld these files exist. For many unanticipated reasons, these may not be made part of the tracking data in the database.

Automation in this context essentially is the scheduling of tasks to run without a human needing to intercede. Humans should spend time on analysis not typing commands into a shell.

Rolling Restarts

This is our internal name for restarting a subset (consisting of nodes) of our clusters. The idea is to restart all managed nodes except the JMS node, usually one at a time. Such restarts are conducted for one of two reasons: 1) have the node pick up a setting or 2) have Java discard from memory everything. The latter is why we restart the nodes once weekly.

Like many, I was skeptical of the value of restarting the nodes in the cluster once weekly. Until, as part of the Daylight Savings Time patching, we provided our nodes to our Systems folks (hardware and operating systems) and forgot to re-enable the Rolling Restarts for one batch. Those nodes starting complaining about issues into the second week. Putting back into place the Rolling Restarts eliminated the issues. So… Now I am a believer!

One of my coworkers created a script which 1) detects whether or not Vista is running on the node, 2) only if Vista is running does it shut down the node, 3) once down, it starts up the node, and 4) finally checks that it is running. Its pretty basic.

Log cleanup to preserve space

We operate on a relatively small space budget. Accumulating logs infinitum strikes us as unnecessary. So, we keep a months’ worth of logs for certain ones. Others are rolled by Log4j to keep a certain number. Certain activities can mean only a day’s worth are kept, so we have on occasion increased the number kept for diagnostics. Log4j is so easy and painless.

We use Unix’s find with mtime to look for files 30 days old with specific file names. We delete the ones which match the pattern.

UPDATE 2007-SEP-18: The axis files in /var/tmp will go on this list, but we will delete any more than a day old.

Error reporting application, tracking, vulnerabilities

Any problems we have encountered, we expect to encounter again at some point. We send ourselves reports to stay on top of potentially escalating issues. Specifically, we monitor for the unmarshalled exception for WebLogic, that tracking files failed to upload, and we used to collect instances of a known vulnerability in Vista. Now that its been patched, we are not looking for it anymore.

Thread dumps

Blackboard at some point will ask for thread dumps at the time the error occurred. Replicating a severe issue strikes us as bad for our users. We have the thread dumps running every 5 minutes and can collect them to provide Blackboard on demand. No messing with the users for us.

Sync admin node with backup

We use rsync to keep a spare admin node in sync with the admin node for each production cluster. Should the admin node fail, we have a hot spare.

LDIS batch integration

Because we do not run a single cluster per school and the Luminis Data Integration Suite does not work with multiple schools for Vista 3 (rumor is Utah has it working for Vista 4), we have to import our Banner data in batches. The schools we host send the files, our expert reviews the files and puts them in place. A script finds the files and uploads each in turn. Our expert can sleep at night.

Very soon, we will automate the running of the table analysis.

Anyone have ideas on what we should automate?

« Older entries