Web browsing history

The history of what I have looked at in my web browser should be a feature I like. I know I read something this weekend about work ever expanding to fill the time. Even as efficiencies make things easier, there are places where waste balloons to make people work more than they really need. I eventually thought the example used were lawyers creating work for each other by overwhelming the opponent with too much information so they have to sift through more. It turns out that was correct.

In the middle of the week I ran across a couple articles about how automation while killing off some jobs will create others. I wanted to include the article from the weekend, but find it was a royal pain in the ass. About ten minutes in I wished that I had sent it to my boss like I thought I should just so it would be easier to find.

Eventually I located it to include in yesterday’s blog post. All it took was finding the right keyword.

I hit so many web pages, search is really the only way to find something so specific. And even then, I have to my library training to find something I want.

Bookmarks or Evernote or save later for services are not that helpful because I have to have the forethought to save them. All too often the things I save are not what I need later and things I failed to save are what I do.

I guess what I want is a smarter web browser history search which can figure out from my browser history what is related to a specific page.

Just Get Rid of Java

Apparently there are security flaws in the current version of Java allowing the installation of malicious software through web browsers unknown to the user. The known attacks using this flaw work on Windows, OSX, and Linux. According to Reuters:

Java was responsible for 50 percent of all cyber attacks last year in which hackers broke into computers by exploiting software bugs, according to Kaspersky. That was followed by Adobe Reader, which was involved in 28 percent of all incidents. Microsoft Windows and Internet Explorer were involved in about 3 percent of incidents, according to the survey.

The Department of Homeland Security recently said computer users should disable Java. At first this seems odd. The vulnerability in question is only in Java 7. So why not go back to Java 6? Well, Java 6 has vulnerabilities too, which is why DHS and others have recommended getting to 7. Also, starting in 7, the automatic upgrades are more aggressive. So going backwards is probably not a great idea. (If just happens I had to go backwards to get a tool I needed to work and forgot to go back forward.)

Also, for a similar situation back in August the recommendation was to make the browser prompt before allowing Java to run. The strategy is just stop Java entirely. Apple has removed Java browser plugins. That could work too. Except for bad, bad software like ours (sorry, sarcasm if you could not tell) which makes use of a few applets. In the last week I have gotten a request to add another applet.

A fix to Java 7’s vulnerabilties should be available in a couple days.

Textarea Backup

I am going through my software installed on my work computer in order to transfer to a new one. This came to my attention as something potentially relevant to others.

A common problem we hear doing web-based learning management system is the web browser crashed before the user could submit a form. The complaints we hear usually are because an assignment was lost so the student received a 0 for a major grade. The ones who managed to redo the assignment in time generally never reach us. Nor do the mail messages or discussions or anything else not for a grade. The causes are many. Naturally the blame lies with us for running such a crappy product. Smart applications like WordPress post/page editor automatically save these boxes. Unfortunately, 99.99% are not smart.

An interesting Greasemonkey script, Textarea Backup, will preserve information written into a textarea form element. When the browser restarts and returns to the page, the information written into the textarea will be there.

Google Chrome does native support for Greasemonkey scripts. Mozilla Firefox still requires the Greasemonkey add-on.

With Greasemonkey installed, one can just hit the install button on a scripts page at userscripts.org and click through the various confirms one really wants to download or install it. Pretty simple to install.

Do colleges or universities actually encourage add-ons like Textarea Backup to students? Or are they left to figure out stuff like this on their own?

Smaller Java Cache

One of our campus Blackboard Learning System Vista Enterprise administrators reported to have reduced the number of Java cache related issues (failed sessions) by changing the Disk Space Allotment from the 1,000 MB default down to 100 MB. This is found in the Java Control Panel > General tab > Temporary Internet Files: Settings. I am curious if anyone else has found this to be the case?

The purpose of web browsers having a cache was to speed up use of a web site by not having to download content again. RAM is faster than disk is faster than Internet. (This especially was true in the mid 1990s.) Take a look at this web site. There is the image at the top plus various CSS, and JS files. It looks like there are a good 224 KB in CSS, JS, and their supporting images. Rather than download significant amount of content again, with the appropriate settings a browser will check whether the size changed (assume no changes) or it expired (really that it is stale). If neither are true, then it uses what it already has. This will make my web site load faster for the user. So caching is a very good thing.

Java Plug-in, the client downloading and rendering applets in a web browser, works similarly. It can keep a copy of the applet in a cache. Starting with Java 1.3 there are even parameters placed in the HTML for applet caching. It looks to me like the HTML Creator, really edit-on(R) Pro by RealObjects, JavaScript for instantiating the applet has settings which enable Java to keep it in its cache.

The default cache size of 1,000 MB sounded excessive at first. Do people really reach the point where the whole cached is used? Looking at mine, I have 4 items in Applications from running them on my desktop plus around 2,200 items in Resources. All this takes up only 155 MB. Most of them are tiny files. The largest ones in Resources are from the various Vista  clusters I administrate. Therefore setting this to 100 MB as recommended probably means these getting downloaded more often and waiting on 1MB+ files to download. Glad we have a fast Internet connection at work. Sucks to be the students on DSL who follow this advice and use lots of Java-based applets.

If the Java Plug-in cache was buggy, then I could foresee problems with display of applets. It should download the applet but does not, it should not download the applet but does, the wrong applet is used, a corrupted applet is used. Instead, this seems to be claiming to solve an issue were the web browser lost the session cookie. It seems very unlikely to me that a Java Plug-in could cause a web browser to lose a session cookie much less changing the cache size fix it.

IE and IQ

A friend posted the Internet Explorer users ‘have below-average IQ’ story on Google+. On the one hand, I love the idea of bashing IE users as incapable computer users who ought to get off the Internet. But then my Psychology background screams at this study as generally worthless. The lack of a statistical analysis ought to be another huge red flag.

I generally think an overall WAIS-IV score is mostly meaningless for something like this. IQ is a measure of skills. The typical use of intelligence is capacity instead. The skill set scores of WAIS like Similarities, Block Design, Sequencing, or Coding would at least indicate where are the differences and give better meaning. My favorite part of What is Intelligence? covered which of these are improving and possibly why.

Age is an obvious factor for which they should have controlled. It was even data they collected. The Flynn Effect demonstrates there are IQ changes over time. If the rumors are true that older Internet users are the most likely to use a default web browser, then that could be a very important factor muddling these results. Correcting for age might dramatically change these results.

Location could also be very important. A work computer might be locked down so the user is not taking the test on their preferred browser.

Report Just Usernames

Occasionally I’ll want to see the usernames who use something like a user-agent property or were doing something during a range of time. Rather than report all the log lines and pick them out of the data, I use this which Blackboard (or maybe BEA added).

Note  we’ve added user-agents to the webserver.log. The double quote I use as my delimiter in the awk is from us adding the user-agent to the webserver logs.If you have not set up your logs to use this, then you’ll either need to do so or figure out which position is appropriate for you with a space delimiter. The colon in the second awk is where just after the username the log records the reads and writes to the database.

| awk -F\” ‘{print $3}’ | awk -F\: ‘{print $1}’ | sort | uniq

An example usage is a case was escalated to me where a student had trouble taking an assessment. That student was, of course, using Internet Explorer 7, a web browser which prior CE/Vista 8.0.4 was supported. Now it is not. (Could be likely this is reason Blackboard stopped supporting in.) So I was curious how many users are still trying to use this browser.

Useful User Agents

Rather than depend on end users to accurately report the browser used, I look for the user-agent in the web server logs. (Yes, I know it can be spoofed. Power users would be trying different things to resolve their own issues not coming to us.)

Followers of this blog may recall I changed the Weblogic config.xml to record user agents to the webserver.log.

One trick I use is the double quotes in awk to identify just the user agent. This information is then sorting by name to count (uniq -c) how many of each is present. Finally, I sort again by number with the largest at the top to see which are the most common.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | sort | uniq -c | sort -n -r

This is what I will use looking for a specific user. If I am looking at a wider range, such as the user age for hits on a page, then I probably will use the head command to look at the top 20.

A “feature” of this is getting the build (Firefox 3.011) rather than just the version (Firefox 3). For getting the version, I tend to use something more like this to count the found version out of the log.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | grep -c ‘<version>’

I have yet to see many CE/Vista URIs with the names of web browsers. So these are the most common versions one would likely find (what to grep – name – notes):

  1. MSIE # – Microsoft Internet Explorer – I’ve seen 5 through 8 in the last few months.
  2. Firefox # – Mozilla Firefox – I’ve seen 2 through 3.5. There is enough difference between 3 and 3.5 (also 2 and 2.5) I would count them separately.
  3. Safari – Apple/WebKit – In searching for this one, I would add to the search a ‘grep -v Chrome’ or to eliminate Google Chrome user agents.
  4. Chrome # – Google Chrome – Only versions 1 and 2.

Naturally there many, many others. It surprised me to see iPhone and Android on the list.

Better CE/Vista Web Server Log

Some support tickets are more easily solved by knowing both user behavior and environment. An often helpful piece of information is what web browser they used. To add this, shut down the cluster, edit /VISTA_HOME/config/config.xml to include the cs(User-Agent), and start the cluster. This line will need to appear for every node. At startup, the nodes will download a new copy of the file.

<elf-fields>date time time-taken c-ip x-weblogic.servlet.logging.ELFWebCTSession sc-status cs-method cs-uri-stem cs-uri-query bytes cs(User-Agent) x-weblogic.servlet.logging.E LFWebCTExtras</elf-fields>

cp config.xml config.xml.bak
sed -s s/bytes x-/bytes cs(User-Agent) x-/g config.xml.bak > config.xml

Probably this could be edited in the Weblogic 9.2 console. I haven’t looked yet.

Upgrade, Upgrade, Upgrade

Be more secure! Upgrade today.

Want better functionality? Upgrade today.

Save a developer! Upgrade today.

The save a developer thing is the impetus for this post.

The upgrade today mantra annoys me.

  1. Software rarely spends enough time in alpha and beta cycles to to identify all the issues.
  2. People have been so burned by using software in alpha and beta cycles, they are hesitant to try upgrades and help determine the issues.
  3. This lack of attention to the problems ensure, versions 1.0, 2.0, n.0 typically have a ton of unknown problems or are even less secure at times.

Unfortunately, the vendor who makes the application platform we run, Blackboard, has a philosophy to look at new web browsers while they are in beta but not actually work towards fixes for the new browsers until after the products are released. With most releases of Java or supported web browsers (Internet Explorer or Mozilla Firefox), Blackboard heard the complaints by the early adopters and released within a couple months an update which resolved the reported issues.

The students and faculty members fail to understand the issue. I think I do. Blackboard (like WebCT prior) understands there are differences between beta and final. Some of us argue these differences are usually minor. However, this is all asking someone to predict the future which we know is haphazard at best.

Long alpha and beta cycles allow more users to get involved, give those back to the developers, have them fixed before the version release. Burning users with buggy software ensures their lack of faith.