Curious Traffic Spike

I glanced at my Google Analytics stats for this site and noticed a huge traffic spike. Somehow my TED Talk: We Are All Cyborgs post landed Bing’s number two spot and Google’s number three spot for “ted talk we are all cyborgs” a couple days ago. Normal for a Tuesday is something like 650 visits. That Tuesday I got 2,578. It kind of reminds me of the Made Stumbleupon.com? post.

The actual We Are All Cyborgs talk was the number one spot for both search engines. Why would anyone come to my site for the same video?

(Glad I turned back on WP-Cache again.)

Asterisks in the sky


Happy (Con)trails

Originally uploaded by Ezra S F

Flickr member Zack Sheppard did me a huge favor yesterday picking this picture for a Flickr blog about Asterisks in the sky. So in one day this picture was exposed to 5,931 people.

Several of those looked at the adjacent picture and others for a total of 10,640 hits yesterday. Lots of comments on many of my photos.

Wow. Just wow.

Useful User Agents

Rather than depend on end users to accurately report the browser used, I look for the user-agent in the web server logs. (Yes, I know it can be spoofed. Power users would be trying different things to resolve their own issues not coming to us.)

Followers of this blog may recall I changed the Weblogic config.xml to record user agents to the webserver.log.

One trick I use is the double quotes in awk to identify just the user agent. This information is then sorting by name to count (uniq -c) how many of each is present. Finally, I sort again by number with the largest at the top to see which are the most common.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | sort | uniq -c | sort -n -r

This is what I will use looking for a specific user. If I am looking at a wider range, such as the user age for hits on a page, then I probably will use the head command to look at the top 20.

A “feature” of this is getting the build (Firefox 3.011) rather than just the version (Firefox 3). For getting the version, I tend to use something more like this to count the found version out of the log.

grep <term> webserver.log | awk -F\” ‘{print $2}’ | grep -c ‘<version>’

I have yet to see many CE/Vista URIs with the names of web browsers. So these are the most common versions one would likely find (what to grep – name – notes):

  1. MSIE # – Microsoft Internet Explorer – I’ve seen 5 through 8 in the last few months.
  2. Firefox # – Mozilla Firefox – I’ve seen 2 through 3.5. There is enough difference between 3 and 3.5 (also 2 and 2.5) I would count them separately.
  3. Safari – Apple/WebKit – In searching for this one, I would add to the search a ‘grep -v Chrome’ or to eliminate Google Chrome user agents.
  4. Chrome # – Google Chrome – Only versions 1 and 2.

Naturally there many, many others. It surprised me to see iPhone and Android on the list.

TED Talk: Picking apart the puzzle of racism in elections

By Nate Silver

A less than convincing point… The list of states with voters reporting a racial bias only well matches the Obama-Clinton difference map because Nate draws the audience to the states he’s picking on: Arkansas, Louisiana, Tennessee, Kentucky, and West Virginia (5 hits). He totally ignores the strong race bias in South Carolina, Alaska, Missouri, or Indiana didn’t translate into more votes (4 false negatives). Also Wyoming and Oklahoma both had no reported racial bias and voted more against Obama (2 false positives).

Stats

Dreamhost collects the access and error logs for the web site domains they host for me. The stats are crunched by Analog. The numbers are okay. I much prefer Google Analytics. (Even AWStats is better.) Analog is good enough.

While at Bbworld, Nicole asked me about the hits to her wedding web site. She made it sound like then she and Ashley had the data but just needed to know how to interpret the data? Now a couple days later they didn’t have the data. Instead, they ran into a password issue.

Shell / FTP:

What I had suggested to Nicole was Ashley could find the stats by going to the logs/william-nicole.com to find the data. (Actually it was logs/william-nicole.com/http/html)

Web:

Since, only Ashley’s user can access the stats through the shell / FTP route, I went into my admin panel to add Nicole and myself a user to access the stats. I erroneously assumed the user with access to manage the content (Ashley) would have access to the stats. Instead, Dreamhost only automatically grants the panel user (me) access to stats. Doh! So I ended up creating them both accounts.

Shameless Plugs:

Nicole’s site is http://william-nicole.com/.

Another site I am hosting for Shel is http://artistictraveler.nu/.

Page View Metric Dying

First Metricocracy measured hits. Pictures and other junk on pages inflated the results so Metricocracy decided on either unique visitors or page views. Now, the Metricocracy wants us to measure attention. Attention is engagement, how much time users spend on a page.

What do we really want to know? Really it is the potential value of the property. The assumption around attention is the longer someone spends on a web site, the more money that site gains in advertisement revenue. The rationale being users who barely glance at pages and spend little time on the site are not going to click ads. Does this really mean users who linger and spend large amounts of time on the site are going to click more ads?

This means to me attention is just another contrived metric which doesn’t measure what is really sought. I guess advertisement companies and the hosts brandishing them really do not want to report the click through rates?

My web browsing habits skew the attention metric way higher than it ought to be. First, I have a tendency to open several items in a window and leave them lingering. While my eyes spent a minute looking the content, the page spent minutes to hours in a window… waiting for the opportunity. Second, I actively block images from advertisement sources and block Flash except when required.

As a DBA, page views also has debatable usefulness. On the one hand we could use it because it represents a count of objects requiring calls to the database and rendering by application and web server code. Hits represent all requests for all content, simple or complex, so is more inclusive. Bandwidth throughput represents how much data is sucked out or pushed into the systems.

We DBAs also provide supporting information to the project leaders. Currently they look at the number of users or classrooms who have been active throughout the term. Attention could provide another perspective to enhance the overall picture of how much use our systems get.

Cat Finnegan, who conducts research with GeorgiaVIEW tracking data, measures learning effectiveness. To me, that is the ultimate point of this project. If students are learning with the system, then it is successful. If we can change how we do things to help them learn better, then we ought to make that change. If another product can help students learn better, then that is the system we ought to use.

Ultimately, I don’t think there is a single useful metric. Hits, unique users, page views, attention, bandwith, active users, etc., all provide a nuanced view of what is happening. I’ve used them all for different purposes.

Made Stumbleupon.com?

Traffic to this web site “spiked” yesterday. It only tripled to about 300 page views in a day. Nothing compared to what we get at work.
🙂

I was curious why the sudden burst almost exclusively to the Quotes to Make You Think page. The referrer for 108 of the 168 visitors that day was stumbleupon.com. Good visitors found other pages as they looked around a bit. Best I can figure, MochiMochii bookmarked my site and five others have indicated they like it.

Wow, if a single review and just a bookmark drives this much traffic, then maybe I am fortunate this page has not hit a top ranking? That could means thousands of hits daily.

UPDATE 2007-JAN-27: Today, the traffic from these… uh… Stumblers… is over 600 page views and we have over 6 hours left in the day. I am impressed people are coming. This quotes page has always been the most popular since I created it back in 2000 or 2001. Will it hit 1,200 Monday, 20,000 Friday? Where is the ceiling? I should have remembered the principles from work…

UPDATE 2007-JAN-27 b: Ha… Topped out at 4,892. That’ll teach me to think maybe it will slow.

Dumbfounded By The Numbers

Chancellor Eroll B. Davis Jr told the Georgia Board of Regents, “We grew essentially by a large university.” The USG gained 10,077 students (my alma mater has ~11,000) in a year. They calculate these fall term to fall term.

In the same fall term to fall term time period, in the same same university system, GeorgiaVIEW gained about 59,000 students (assumes 1/10th of 65,000 active user growth are instructors/designers). Its only 9x the system growth rate. It actually reflects a slowing in the growth rate for GeorgiaVIEW. Partly this is because we are fast approaching the number of potential users. Market penetration becomes more difficult when people are using it.

Fortunately, users will become more intelligent in their use over time. So, even though the number of users may plateau, because each user will use the system more, the amount of use will continue to increase.

Unfortunately, another DBA and I consider the number of users a more or less uninformative statistic. It looks good in news papers as its something the general public probably understands. Other numbers mean more for us:

  1. Hits – The count of items downloaded from the web servers. We often use hits as a measure of user activity. Unfortunately, we are only collecting this at the daily or monthly values.
  2. Who Is Online (Total / Active) – SQL pulls from the WIO table a count of all the rows (Total) and those whose time in the table is recent (Active). Both have issues… For example, users failing to logout and inflate the total. Active has weird spikes which suggests to me these tables are reaped every 1/2 hour or so.
  3. Storage – Amount of information stored by the users. For example, our storage growth is 2.23 times the previous year (slowing down from 2.25). The number of new users has largely slowed, but the amount of storage staying fairly consistent means to me the users are doing more with the system.

Amy’s presentation at BbWorld 2007 on capacity planning is a much more authoritative approach than this blog post.
🙂

Better Way to Count

Our awesome sysadmins have put the user agent into our AWStats so we are tracking these numbers now. They discovered something I overlooked. Netscape 4.x is 10 times more used than 7.x or 8.x. Wowsers! Some people really do not give up on the past.

Back in the Netscape is dead post, I used this to count the Netscape 7 hits.

grep Netscape/7 webserver.log* | wc -l

Stupid! Stupid! Stupid! The above requires running for each version of Netscape. This is why I missed Netscape 4.

This is more convoluted, but I think it its a much better approach.

grep Netscape webserver.log* | awk -F\t ‘{print $11}’ | sort | uniq -c | sort -n

It looks uglier, but its much more elegant. Maybe I ought to make a resolution for 2008 to be elegant in all my shell commands.

This version first pulls any entries with Netscape in the line. Next, the awk piece reports only the user agent string. The first sort puts all the similar entries next to each other so the uniq will not accidentally duplicate. The -c in the uniq counts. The final sort with the -n orders them by the uniq’s count. The largest will end up at the bottom.

Netscape to Die… Finally!

Just posted an internal email about what we ought to do about the End-of-Service announcement for Netscape. Usage of Netscape browsers has plummet even as Firefox as increased. Its finally hit the floor such that even AOL has given up on it. Why did they make NN 9? A snapshot of its use relative to total hits for the past ~30.5 days at two of the sites we run:

                   CVIEW             OVIEW
  Browser       Hits     %        Hits    %
  Netscape 7  108,739  0.18%    186,105  0.22%
   -- Mac       6,319  0.01%     33,249  0.04%
  Netscape 8   56,655  0.09%     85,817  0.10%
  Netscape 9        0  0.00%          0  0.00%

My first web browser was Netscape 1. Every version up to Netscape 7.0 was at one time my primary web browser until I switched finally to Mozilla Firefox in 2004. Browser crashes are not unknown in testing, so to loose my place with other stuff (wikis, notes, documentation) frustrates even myself, so I still use NN7.2 for testing.

There hasn’t been an update to NN 7.2 in 3 years, so EOS doesn’t really mean anything to those using it still. So, I don’t expect anyone to do anything. I haven’t heard demands that we provide support for NN8, so I doubt NN7 will be much different.

Too bad, it came in with a whimper and will go out with a whimper.