March of the Machines (Automation)

Saw a tweet about and interesting piece in ABC News Australia Digital disruption: How science and the human touch can help employees resist the march of the machines. Basically, many jobs are going away due to automation. W.I.R.E.D. has a similar story: Robots Will Steal Our Jobs, But They’ll Give Us New Ones.

One of the long struggles I have ever pushed in my career is automation of machines. My approach falls along the line of: if it is going to be done more than once or will take a really long time by hand, then it needs to be automated. This is hard to do. The temptation is to do it by hand once, see how it went, then write a script which does it for the next time. The trouble being that if this is done between having completed the first one and the second, then there is little incentive. Best is to make the automation part of doing it the first time, the second time can include any remediation necessary to make it more perfect.

All this automation makes us more effective employees. My team of three managed hundreds of web servers and dozens of database servers for ten sites. Without automation that would have been a nightmare. The replacement product was more difficult to automate so with fewer servers we needed more people. Yet the drive to better automation is making lives easier. (Technically I left that program about a year ago when my replacement was hired and took over my spot in the on-call rotation.)

A fear I hear about automation is that people will lose their jobs. It reminds me globalization and manufacturing moving overseas to China. Highly repetitive, mindnumbing jobs were the most at risk and as those work forces got better, what was at risk moved up the complexity ladder.

The fear of both globalization and automation led to books like A Whole New Mind. The idea is that if your job is highly repetitive or analytical, then it is at risk to these forces. Becoming the person who designs, describes, coordinates, or finds meaning in stuff (aka “right brain” activities) is the way to survive the coming storm. This book very influenced how I started thinking about my work.

Back in 2003, I automated everything I could because I was overwhelmed with work and little resources beyond great computers and my own skill to make it better. My supervisees focused on meeting with the clients to talk about the web site they wanted and build that. I wrote code to report about or fix problems to prevent people needing to call or email about problems.

Where I wish we would head is more like You Really Don’t Need To Work So Much. I meant to send this to my boss (maybe he’s reading this blog)? All our efficiencies should mean we have less to do not more, so why do we work so hard?

The past fifty years have seen massive gains in productivity, the invention of countless labor-saving devices, and the mass entry of women into the formal workforce. If we assume that there is, to a certain degree, a fixed amount of work necessary for society to function, how can we at once be more productive, have more workers, and yet still be working more hours? Something else must be going on.

From my experience, the to-do list gets ever larger. Not because there is more to do, but because more is possible. I’d just rather spend more of my time on solving hard problems than easy repetitive tasks.

P.S. This post really only exists because I loved the phrase “March of the Machines” enough I wanted it as a title for something on this blog.

Convert Webserver.log to CSV

A security guy at a campus wanted our web server log file in the CSV format. The original file has lines which look something like:

machine.usg.edu: webserver.log13646,2010-11-30        11:08:32        0.0010  999.999.999.999    b7tPM1hTgGYMn90bLTM1    200     GET     /webct/urw/lc987189066271.tp1333853785371/blank.html    –       262     “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4” username:0:0

Turns out I only need three sed edits to make it look the way I want:

sed ‘s|:2009-|,2009-|g’ testfile.txt | sed ‘s|\t|,|g’ | sed ‘s|: |,|g’

The first converts the colon between the end of the file name and the year into a comma. The second converts all the tabs into commas, and the last changes the colon-space between the host name and webserver.log into a comma.

Easy enough. That line from the web server log now looks like:

machine.usg.edu,webserver.log13646,2010-11-30,11:08:32,0.0010,999.999.999.999,b7tPM1hTgGYMn90bLTM1,200,GET, /webct/urw/lc987189066271.tp1333853785371/blank.html,-,262, “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4”,username:0:0

I love regular expressions.

I have a feeling I’ll need to make a primer for this guy too. 🙁

Hostname,Log Name, Date, Time, Seconds to Process, Load Balancer IP, Session ID, HTTP Response Code, HTTP Method, URI, URI Parameters, Bytes Returned, User Agent, Username:Transactions Read:Transaction Written

Failed Sessions

For exactly two months now I have been working on a re-opened issue (on Oct 7, 2009) where sessions appear to die in Blackboard Vista 8.0.2 hf1.

The first time this came up, Blackboard support wanted us to overhaul the session management. BIG-IP documents saying attempting this new method was a horrible idea caused us never to get on board. We agreed to conduct dupe.pl tests which showed there wasn’t a problem with session spray, which the solution was designed to resolve. Stonewalled, we closed the ticket when the institution reporting it didn’t have any cases to provide us.

So our client with the issue asked us to resume work on it. The key information they provided me was their users hit the /webct/logonDisplay.dowebct. Since they use Single-Sign On (SSO) from a portal, no users should ever hit this page. From investigating these cases, I was able to find a number of cases of users hitting /webct/displayAssessment.dowebct or /webct/displayAssessmentIntro.dowebct with the guest user.

See, the guest user exists at the domain learning context. Users appear as guest before they login or as the logout. They should not appear as guest when taking a quiz.

So I provided this information to Blackboard with the web server logs. They wanted more cases, so I provided more. More clients reported the issue, so I had plenty of sources. Plus it pointed to this problem affecting at least 4 if not all clusters.

Next, our TSM left, so we were provide a new person unused to us. It took just the first note to make a huge mistake. “Provide us all the logs from all the nodes.” At 5GB of logs times 14 nodes in a cluster, 70GB of information for an event which took up maybe 10KB seems like overkill. So… No. I like to think of my self as proficient at system administration, which means I can gather whatever logs you desire.

Now we come to the second mistake. Please refrain from asking me questions already explained in the ticket. Sure, the ticket has a large amount of information. However, if I can remember what is in the ticket, then so can the people working it.

Unfortunately I had to answer a question about replicating this with: it was based on my log trolling not actual cases of students complaining. My mistake was not going to the clients to find a description of the problem. Therefore, Blackboard wanted a WebEx so I could explain the same one sentence repetitively. *headesk* We agreed on me getting a case where a user could explain the problem.

As luck would have it, I got just a case a few days later. So I captured the web server log information and sent it along with the user description. My laziness resulted in me not trimming the log set down to the period of the error. Therefore, this log set showed a user1 login, user2 login, then user1 login again. Blackboard responded this might be a case of sporadic shifting users. Hello! I guess these folks are not used to seeing the SSO login to be able to know the session shifted to another user because… it… logged… in?

By pulling the entries from the f5 log showing the client IP address, Blackboard now wants us to implement a configuration change to the f5 to reflect the browser’s IP in our web server log. Getting such a change isn’t easy for us. Don’t say this is the only way to get client IPs when I… have… sent… you… client IPs. We’ve been at this impasse for 3 weeks. So I get to have another WebEx where I explain the same thing I’ve already written. *headesk*

Maybe it is finally time to ask the people if they are at all familiar with the known issue which sounds like the issue?

VST-3898: When taking an assessment the session is not kept alive. The student’s session times out forcing the student to restart the assessment or makes them unable to complete the assessment.

We plan to implement the upgrade which resolves this issue next week. So, I am hoping this does resolve it. Also, I am tempted to just close this ticket. Should the institutions find they are still having problems in January when the students have had a few quizzes fail, then I might have forgotten how utterly completely useless Blackboard has been on this issue.

All I ask is:

  1. Know the information in the ticket so I don’t have to copy and paste from the same ticket.
  2. Don’t ask for all the logs. Tell me what logs you want to view.
  3. Don’t tell me something is the only way when I’ve already shown you another way. I’m not an idiot.
  4. Don’t ask me if the f5 log has the cookie when the entries I’ve already sent you don’t have it.

🙁

Useful User Agents

Rather than depend on end users to accurately report the browser used, I look for the user-agent in the web server logs. (Yes, I know it can be spoofed. Power users would be trying different things to resolve their own issues not coming to us.)

Followers of this blog may recall I changed the Weblogic config.xml to record user agents to the webserver.log.

One trick I use is the double quotes in awk to identify just the user agent. This information is then sorting by name to count (uniq -c) how many of each is present. Finally, I sort again by number with the largest at the top to see which are the most common.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | sort | uniq -c | sort -n -r

This is what I will use looking for a specific user. If I am looking at a wider range, such as the user age for hits on a page, then I probably will use the head command to look at the top 20.

A “feature” of this is getting the build (Firefox 3.011) rather than just the version (Firefox 3). For getting the version, I tend to use something more like this to count the found version out of the log.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | grep -c ‘<version>’

I have yet to see many CE/Vista URIs with the names of web browsers. So these are the most common versions one would likely find (what to grep – name – notes):

  1. MSIE # – Microsoft Internet Explorer – I’ve seen 5 through 8 in the last few months.
  2. Firefox # – Mozilla Firefox – I’ve seen 2 through 3.5. There is enough difference between 3 and 3.5 (also 2 and 2.5) I would count them separately.
  3. Safari – Apple/WebKit – In searching for this one, I would add to the search a ‘grep -v Chrome’ or to eliminate Google Chrome user agents.
  4. Chrome # – Google Chrome – Only versions 1 and 2.

Naturally there many, many others. It surprised me to see iPhone and Android on the list.

Email Harvesters

Good Sign I missed the story about brothers convicted of harvesting emails the first time. Well, I noticed a followup.

Back around 2001, the CIO received complaints about performance for the web server. So, I went log trolling to see what the web server was doing. A single IP dominated the HTTP requests. This one IP passed various last names into the email directory. Some quick research revealed Apache could block requests from that IP. That calmed things down enough for me to identify the owner of the IP. The CIO then bullied the ISP to provide contact information for the company involved.

Previous little adventures like this landed me a permanent job, so I jumped at similar challenges.

Well, a few years later, it happened again. This time my boss had made me develop a script for the dissemination of the anti-virus software package to home users. Basically, it used email authentication for verification if someone could get the download link. So, I applied the same technique to the email directory. Well, this upset some people who legitimately needed email addresses. So the human workers would provide email addresses to people with a legitimate need.

I’m glad since I’ve left, VSU no longer looks up email addresses for people. (I thought some of the requests questionable.) Also, my little email authentication script was before LDAP was available to the university. I think the new solution much better.

One the more vocal complainers about my having stopped non-VSU access to the email directory was my current employer. We apparently list email addresses for employees freely. Which makes me wonder how much spam we get is due to the brothers described at the beginning of this story? Or other email harvesters? Just hitting the send button potentially exposes the email address.

No worries. I’m sure Glenn is protecting me. 🙂

Better CE/Vista Web Server Log

Some support tickets are more easily solved by knowing both user behavior and environment. An often helpful piece of information is what web browser they used. To add this, shut down the cluster, edit /VISTA_HOME/config/config.xml to include the cs(User-Agent), and start the cluster. This line will need to appear for every node. At startup, the nodes will download a new copy of the file.

<elf-fields>date time time-taken c-ip x-weblogic.servlet.logging.ELFWebCTSession sc-status cs-method cs-uri-stem cs-uri-query bytes cs(User-Agent) x-weblogic.servlet.logging.E LFWebCTExtras</elf-fields>

Command:
cp config.xml config.xml.bak
sed -s s/bytes x-/bytes cs(User-Agent) x-/g config.xml.bak > config.xml

Probably this could be edited in the Weblogic 9.2 console. I haven’t looked yet.

Most Wired Teacher

“Who is the most wired teacher at your college?” (A Wired Way to Rate Professors—and to Connect Teachers)

Although the university runs workshops on how to use Blackboard, many professors are reluctant, or too busy, to sit through training sessions. Most would prefer to ask a colleague down the hall for help, said Mr. Fritz.

Professional support is too intimidating, cold, careless. Support fixes the problems of others who created problems for themselves:

  • choices made in software to use
  • configuration choices
  • mistakes logic in processing

The concept of identifying the professors who most use the system is a good one. We already track the amount of activity per college or university in the University System of Georgia. The amount of data (think hundreds of millions of rows across several several tables)  would make singling out the professors a very long running query. Doesn’t mean it is a bad idea. Just don’t think it is something we would do with Vista 3. We probably could with Vista 8 which uses a clean database.

I’d like to see two numbers:

  1. Number of actions by the professor
  2. Number of actions by the all classes the professor teaches

Ah, well, there are lots of other reports which need to be done. Many more important than this one. 

Some questions from the article: “Will colleges begin to use technology to help them measure teaching? And should they?” At present, to create such reports, IT staff with database reporting or web server skills are needed. Alternatively, additonal applications like Blackboard Outcomes System can provide the data. The real problem is the reliability and validity of the data. Can it really be trusted to make important decisions like which programs or employees are effective.

WordPress Error: This file cannot be used on its own.

In posting a comment to a friend’s WordPress blog, it came up with the error:

Error: This file cannot be used on its own.

I was responding to a comment, so I doubted that he broke his blog between making a comment and my response. So I went looking though my own install. Essentially, at a shell I used

find . -exec grep -l "This file cannot be used on its own." {} \;

to locate the file involved is wp-comments-popup.php. This file contains code which checks for the HTTP_REFERER variable has specific values equal to the path and file name for the comments page. If this is not the case, then it should throw this error. The file mentioned in the error is wp-comments.php.

Its seems that I had configured my web browser not to pass the HTTP referrer to web servers, so the check failed and threw this error.

Maybe the WordPress developer who designed this has no idea about the ability of web browsers not to send a referrer. Searching for the error on the WP site yielded nothing. From the tons of comments about people hitting this error, lots of people turn off sending referrers.

Solution for those leaving comments: If you attempt to leave a comment and see this error, then enable referrers. WordPress actually has a decent article on enabling HTTP referrers for a number of different pieces of software.

More friendly error for WP blog owners: Edit wp-comments-popup.php. Change

die (‘Error: This file cannot be used on its own.’);

to

die (‘Learn how to <a href=”http://codex.wordpress.org/Enable_Sending_Referrers”>enable HTTP referrers</a> to fix this. ‘);

Tale of Defeating the Crazy Woman

Cross posted from Rants, Raves, and Rhetoric?

Babies are fascinated by me. When the two of us are in a room, they often find me the most interesting thing in the room. Usually, it is mutual.

So, a mutual friend of a friend, Mojan has a fantastic blog. The past year or so has been about being pregnant and most recently figuring out how to be a parent for the first time. Well, a crazy woman set up a ‘blog” which hotlinks images from Mojan’s blog and falsely represents the child in the photos. Ick. I offered to help with this identity theft issue.

Once upon a time, I was annoyed with people taking images from my last employer’s web site. Since I was the campus web designer, I created an image which said, “All your image are belong to VSU.” Also, as the web server administrator, I figured out how to defeat hotlinking with .htaccess by using mod_rewrite to give them my annoyance rather than their content. For the next couple days I watched the perpetrators try and figure out what was wrong. The hate mail I got was fantastic! I recommended Mojan do the same. When she agreed, I went researching to do what I did once upon a time. This is the .htaccess file I recommended she try.

# Basics
Options +FollowSymlinks
RewriteEngine On

# Condition is true for any host other yours
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mojansami\.com/ [nc]

# What to change gif, jpg, png to which target. In this case does not exist.
RewriteRule .*\.(gif|jpg|png)$ http://mojansami.com/images/stolenpic.jpg [nc]

My directions were not all that specific. So the next thing I know, her site is sporting an Internal Server Error. *headdesk* She used Dreamweaver to create the .htaccess file and upload it to her site. She reported the file she uploaded disappeared. Eventually, it did occur to me to look for the error.log and see what it said. The log complained about DOCTYPE in the .htaccess file in the home directory. A file which did not show in the FTP listing. So, replacing the bad .htaccess file with a blank one fixed the Internal Server Error.

The .htaccess file in the right place, of course, resolved the issue with the crazy woman hotlinking.

Nothing can fix the pain of another person committing identity theft against you or your loved ones. I really hope Mojan doesn’t become discouraged and abandon blogging entirely. Between moderation and authentication she might find a better balance.

Do you have any stories of online identity theft?

UPDATE 2010-MAR-06: She pulled down the blog. Facebook is safer from crazy people.

Tale of Defeating the Crazy Woman

Babies are fascinated by me. When the two of us are in a room, they often find me the most interesting thing in the room. Usually, it is mutual.

So, a mutual friend of a friend, Mojan has a fantastic blog. The past year or so has been about being pregnant and most recently figuring out how to be a parent for the first time. Well, a crazy woman set up a ‘blog” which hotlinks images from Mojan’s blog and falsely represents the child in the photos. Ick. I offered to help with this identity theft issue.

Once upon a time, I was annoyed with people taking images from my last employer’s web site. Since I was the campus web designer, I created an image which said, “All your image are belong to VSU.” Also, as the web server administrator, I figured out how to defeat hotlinking with .htaccess by using mod_rewrite to give them my annoyance rather than their content. For the next couple days I watched the perpetrators try and figure out what was wrong. The hate mail I got was fantastic! I recommended Mojan do the same. When she agreed, I went researching to do what I did once upon a time. This is the .htaccess file I recommended she try.

# Basics
Options +FollowSymlinks
RewriteEngine On

# Condition is true for any host other yours
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mojansami\.com/ [nc]

# What to change gif, jpg, png to which target. In this case does not exist.
RewriteRule .*\.(gif|jpg|png)$ http://mojansami.com/images/stolenpic.jpg [nc]

My directions were not all that specific. So the next thing I know, her site is sporting an Internal Server Error. *headdesk* She used Dreamweaver to create the .htaccess file and upload it to her site. She reported the file she uploaded disappeared. Eventually, it did occur to me to look for the error.log and see what it said. The log complained about DOCTYPE in the .htaccess file in the home directory. A file which did not show in the FTP listing. So, replacing the bad .htaccess file with a blank one fixed the Internal Server Error.

The .htaccess file in the right place, of course, resolved the issue with the crazy woman hotlinking.

Nothing can fix the pain of another person committing identity theft against you or your loved ones. I really hope Mojan doesn’t become discouraged and abandon blogging entirely. Between moderation and authentication she might find a better balance.

Do you have any stories of online identity theft?