Useful User Agents

Rather than depend on end users to accurately report the browser used, I look for the user-agent in the web server logs. (Yes, I know it can be spoofed. Power users would be trying different things to resolve their own issues not coming to us.)

Followers of this blog may recall I changed the Weblogic config.xml to record user agents to the webserver.log.

One trick I use is the double quotes in awk to identify just the user agent. This information is then sorting by name to count (uniq -c) how many of each is present. Finally, I sort again by number with the largest at the top to see which are the most common.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | sort | uniq -c | sort -n -r

This is what I will use looking for a specific user. If I am looking at a wider range, such as the user age for hits on a page, then I probably will use the head command to look at the top 20.

A “feature” of this is getting the build (Firefox 3.011) rather than just the version (Firefox 3). For getting the version, I tend to use something more like this to count the found version out of the log.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | grep -c ‘<version>’

I have yet to see many CE/Vista URIs with the names of web browsers. So these are the most common versions one would likely find (what to grep – name – notes):

  1. MSIE # – Microsoft Internet Explorer – I’ve seen 5 through 8 in the last few months.
  2. Firefox # – Mozilla Firefox – I’ve seen 2 through 3.5. There is enough difference between 3 and 3.5 (also 2 and 2.5) I would count them separately.
  3. Safari – Apple/WebKit – In searching for this one, I would add to the search a ‘grep -v Chrome’ or to eliminate Google Chrome user agents.
  4. Chrome # – Google Chrome – Only versions 1 and 2.

Naturally there many, many others. It surprised me to see iPhone and Android on the list.

DDoS of Social Media

Twitter, Facebook, LiveJournal and other sites all admitted to suffering from a DDoS attack. It seem to me the purpose of a Denial-of-Service attack (DoS) against a web site is to flood it with so much traffic the site becomes unusable. The DDoS is where multiple other computers are coordinated into launching the attack.

All three of the above mentioned sites have had recent issues keeping up with growing usage. The USA inauguration and Iran demonstrations peaked traffic so much the sites seemed like they suffered from a DoS. Already at the edge, an attack tipped the barely making it social media sites over it. Some users abandon them for less popular (so more stable sites). Those who stick around suffer from learned helplessness.

Causing all this hullabaloo over a single user seems odd to me. I don’t speak Russian, so I don’t know if this guy from Georgia (the country) deserved it. Also, it is almost the one year anniversary since Russia invaded Georgia. During the invasion, DDoS attacks disabled Georgian web sites. So, maybe this is to show Georgia the Russians are still capable of causing problems? This is why security evangelists want us to be able to deal with threats.

Various computer viruses over the years have turned millions of computers into zombies for botnets. So… If you are upset about your favorite social media site getting taken down, then maybe you should act on ensuring your computer and others in your social network were not enlisted into a botnet?

Trusting Social Networks

Sunday at brunch we had an interesting conversation about Facebook.

Establishing the appropriate privacy levels to the various constituents see appropriate material is hard. So hard it takes a long pages of text and screenshots to just paint a picture of what to review for the top 10 Facebook privacy settings.

We were discussing how to make the Facebook world we touched more private. How to keep those we supervise or those who supervise us at bay once accepted into our social circle. Few of us only post things our grandmothers would find acceptable, so how do we ensure grandma will never see that picture? This meant banning grandma from seeing the Wall or photo albums or tagged photos.

I had heard we would soon be able to change the privacy levels of individual posts.  This privacy granularity comes at a price according to the New York Times:

By default, all your messages on Facebook will soon be naked visible to the world. The company is starting by rolling out the feature to people who had already set their profiles as public, but it will come to everyone soon.

People like walled gardens. Taking a term from Seth Godin, interacting with just the handpicked few forms a tribe.

If sunlight is the best disinfectant, then social networking on Facebook will die should it be exposed to the world (or too hard to remain private). The most common criticism of blogging is the whole world is in your business. People like the faux-protection of participating online where Google cannot archive it for posterity. This is why Facebook experienced such explosive growth.

Hopefully users will be able to deal with keeping everything as private as they like. Otherwise, we’ll be looking for another walled garden. Maybe I’ll even end up back on my private Twitter account?

Email Harvesters

Good Sign I missed the story about brothers convicted of harvesting emails the first time. Well, I noticed a followup.

Back around 2001, the CIO received complaints about performance for the web server. So, I went log trolling to see what the web server was doing. A single IP dominated the HTTP requests. This one IP passed various last names into the email directory. Some quick research revealed Apache could block requests from that IP. That calmed things down enough for me to identify the owner of the IP. The CIO then bullied the ISP to provide contact information for the company involved.

Previous little adventures like this landed me a permanent job, so I jumped at similar challenges.

Well, a few years later, it happened again. This time my boss had made me develop a script for the dissemination of the anti-virus software package to home users. Basically, it used email authentication for verification if someone could get the download link. So, I applied the same technique to the email directory. Well, this upset some people who legitimately needed email addresses. So the human workers would provide email addresses to people with a legitimate need.

I’m glad since I’ve left, VSU no longer looks up email addresses for people. (I thought some of the requests questionable.) Also, my little email authentication script was before LDAP was available to the university. I think the new solution much better.

One the more vocal complainers about my having stopped non-VSU access to the email directory was my current employer. We apparently list email addresses for employees freely. Which makes me wonder how much spam we get is due to the brothers described at the beginning of this story? Or other email harvesters? Just hitting the send button potentially exposes the email address.

No worries. I’m sure Glenn is protecting me. 🙂

Worldwide Photo Walk

Flickrite Shadows I’m looking forward to this Athens part of the Worldwide Photo Walk in four weeks. I’m even more impressed it filled to the 50 person capacity. We have been having meetups for Athens Flickr users since September. I don’t think any have approached half that number. (There are only 32 members in the Flickr group.) I attribute this success to Steven Skelton‘s efforts spreading the word.

Facebook Usernames

If you cannot find me, then you are not looking. If you search on Facebook for Ezra Freelove, then I am the only result at the moment. Maybe all you knew was Ezra and the city where I lived? Facebook search is not so great you could find me through my first name plus something else you knew about me (other than email or city). Probably this is for the best. We don’t want to make it too easy to stalk people, right?

Allowing users to make a username is a promotion. The blogosphere making a fuss over all this is a Chicken Littleesque. Sure Myspace, Twitter, and a number of other sites have addresses with usernames in them. No one is forcing people opposed to having one to make one. Only in the past month could one choose a username for one’s Google profile. Prior to that it was a hefty large number of numbers.

I think the reason some people prefer usernames comes down to elaborative encoding. To retain something in memory, we associate that something with existing items in memory. Short-term memory has only about 7 slots and digits are each a single item. Assuming a single incrementation per account created and over 200 million users, using a numbers means there ought to be 9 digits worth of numbers to memorize. Words occupy a single slot in short term memory, by far simplifying remembering. Which would you rather try to remember 46202460 or ezrasf?

An argument against usernames comes down to using the memory of the Facebook database or other computer memory. Computer memory is better than human memory for stuff like this.

All of these work and go to the same place:

  1. http://www.facebook.com/profile.php?id=46202460
  2. http://www.facebook.com/ezrasf
  3. http://www.ezrasf.com/fb

Pick your poison. Enjoy.

Expression Costs

(This started out as a blog comment for Sania’s post Facebook Killed Your Blog. I’m posting it here first.)

We share blogs with the whole world. So our blogs get lost in the noise, bolstering the need for a whole industry optimizing getting found in search engines. Its a concerted effort just get noticed. That’s because blog readers have to seek out blogs to follow, subscribe to the feed, and follow. Finding the best blogs to read is sometimes difficult and more from word of mouth than anything search engines provide.

Blogs also tend to have a lot of information to digest. Social networks have just a line or two with maybe a link to more information. Blog readers typically are designed around the idea of collecting all the posts and letting the user pick which to read. Social networks typically are designed around the idea of just showing recent posts and letting the users choose how far back in time to read.

As technologies lower the costs to express ideas (aka get easier), blogs will get left behind as they have become upside down in value. The costs of writings, reading, subscribing, and commenting on blogs are more expensive compared to micro-blogging or status updates.

Why blog when hanging out on social networks are so much easier? Blogs can only survive as long as they have information worthy.

Why blog when readers are no longer reading? Posting blog entries on social networks does help keep traffic levels somewhat by getting exposure.

As bloggers providing valuable expression leave blogging, the value of blogs decrease. People will still blog. It just won’t be the popular thing to do.

The LMS is So Web 1.5

The claims Blackboard’s Learn 9 provides a Web 2.0 experience has bothered me for a while now. First, it was the drag-n-drop. While cool, that isn’t Web 2.0 in my opinion. A little more on track is the claim:

The all-new Web 2.0 experience in Release 9 makes it easy to meaningfully combine information from different sources. The Challenges Are Real, But So Are the Solutions

Integrating with a social network like Facebook is a start, but again, in my opinion, it still isn’t Web 2.0.

So, what is Web 2.0? I did some digging. I think the Tim O’Reilly approach meets my expectation best. He quotes Eric Schmidt’s “Don’t fight the Internet.” as well as provide his own more in depth.

Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I’ve elsewhere called “harnessing collective intelligence.”) Web 2.0 Compact Definition: Trying Again

Users expect a site on the Internet to meet their needs or they eventually move on to a site which does. There are so many web sites out there providing equivalent features to those commonly found in an LMS. There is the danger of irrelevance. This is why every LMS company or group strives to continually add new features (aka innovating). The bar continually gets raised, so LMS software continually needs to meet this higher standard.

Tim additionally provides some other rules which you can see at the above link.

When an LMS reachs the point where the resources of the Internet helps people learn, then it will be a Web 2.0. As long as an expert or leader imparts knowledge on students, the LMS is still something different than Web 2.0. Sorry…. The irony? This is exactly what Michael Wesch and PLE advocates preach.

The Twitter Timesink

Glenn asked: “What is it about Twitter that makes it more of a time sink than Facebook?”

I consider a time sink something where I invest a high value of time for boring and poor value.

My contacts mostly duplicate in Twitter what they provide in Facebook. The time I spend reading Twitter posts I’ve already read in Facebook is a waste of my time. My Twitter contacts respond about a 1/5th as much as Facebook users (it used to be higher in Twitter). So I get more out of Facebook.

Twitter Replies suck. The Replies system makes it look like my contacts reply much more to me than others which I find highly unlikely. More likely the Replies implementation stifles conversation by requiring either everyone to be public or to allow all the participants to follow each other for there to be one conversation. Instead its many different (sometimes hidden) duplicate conversations. Facebook comments are attached to the status update so following a conversation is significantly easier.

Twitter Apps suck. Last Friday, I looked at Facebook Connect for AIR. My complaint about it was my interactions with Facebook would be as limited as Twitter. The promise of Twitter apps is to do more than the Twitter.com web UI provides. Many just provide easier ways to do the same thing: see your Twitter timeline. Others let you see quantification of your usage. Facebook apps by contrast provide access to content not within Facebook, so more of the web because part of my Facebook access so I can actually do more.

Except Socialthing and Tweetdeck. They are exemplary implementations of Twitter Apps. They extend the functionality of just Twitter by itself and are primary reasons I kept at it for so long. Socialthing unofficially died a while ago and official stoppage of support was announced last week while I wasn’t using it. Tweetdeck probably will stick around for a while.

Twitter lacks granular privacy. In Twitter, either you are private or public or ban specific users. I’m torn between public and not. So I opted for private with sneezypb where I mostly subscribe to friends. My other account, ezrasf, was where I subscribed to Blackboard community members, educational technologists, etc. Facebook could improve some in privacy as well. Compared to Twitter, Facebook makes a great attempt at granular privacy. Plurk, another microblogging / status update site, represents the privacy  Holy Grail for me. It allows for making specific posts public, private, available to groups, or individuals.

Session Oddities

One of the clients we host complained about losing their session. Blackboard recommended we switch how our load balancer is handling the session persistence. Before agreeing to do that, we decided to use Blackboard’s script to determine if there is a problem before trying to fix something which may or may not exist.

An acceptable number of sessions showing on multiple nodes of a cluster is less than 5%. When I ran the test, I found 35.8% matched this criteria. But wait just a second, this seemed like an extraordinarily high number. I ran a second test for an identically configured cluster on the same hardware to find only 4.3%. Why are these so different?

Most cases of this “duplicated session” I spot checked were 1 hit for autosignon on another node. Blackboard confirmed these happen before the user has logged in, so they could appear on the other node. So I ran the test again ignoring these autosignon requests and found we were down to 7.2%. Close to acceptable but not quite.

 Similar to autosignon, the editonpro.js appeared in the majority of the cases I spot checked as the sole hit another node. Once, I removed those from the test, I was down to 0.7%. My control cluster was down to 1.4%. 

One would hope the the script used to determine the amount of duplicate sessions would ignore or remove from the data set the known false positive log entries. 

One would also hope the script instructions (requires login to Blackboard help site) would help users account for these false positives. I did leave a comment on the instructions to hopefully help the next person who has to do this.