Why you need a pre-performance routine

As he moves toward the OR, McLaughlin is running through a precise series of thoughts and visualizations, which he calls the Five Ps. First is a Pause: He tries to forget what’s happened earlier in the day and focus only on the present. Next, he thinks deeply about the Patient. “This is a seventy-three-year-old man, and we need him to come out of this pain-free and able to walk more easily,” he says to himself. He reviews his Plan, mentally rehearsing the surgery step-by-step. Then he offers some Positive thoughts: “You were put on this Earth to do this operation,” he says. Finally, as he steps toward the table, he says a quick Prayer. “It’s very ritualistic, and I’m very focused,” he says.

Back when I had to do Friday night maintenance work as a GeorgiaVIEW database administrator, I had something like this. I would do the Pause to quiet my mind to become fully present. Then I would think about the systems involved. Then I would mentally step through the plan for the maintenance.

 

healthcare.gov

As an information technology professional, when a web site has performance problems, I sigh, gnash my teeth, and gripe just like everyone else. However, twenty minutes later I realize I have been there and feel bad for those having to deal with the mess. Also, should I feel hurt that I am not among the nation’s brightest IT minds since I was not asked to help?

GeorgiaVIEW, one of the projects on which I work, has about four thousand active users on average and with topping out around 5-6 thousand week days and eight during an abnormal event. When users are having problems, they tend to come back which gives them a new session yet the old one has not expired, so the system deals with more and more sessions compounding a performance problem. Some of the descriptions people gave about having problems with healthcare.gov sounded like they came back over and over trying to enter.

The most annoying thing about the healthcare.gov problems though are the pundits. Early on, I heard they should have hired Silicon Valley companies to build the site as though IT people only come from there. They specifically named companies famous for their high profile meltdowns to build the health care exchange as experts in building huge sites without problems. Later came the small companies who build web sites for others, but not at this scale.

It is extremely difficult to build a site to the perfect scale. Overbuilding is expensive, so there is pressure to scale back. Business workflows are murky at best because until people use it, they really are unsure what it is they want. (They just know what was built is not right and why.)

Anatomy of a Mistaken Shut Down

We intensely monitor our servers. We want to know things before a work ticket reaches us.

So a  month ago one morning I saw notifications where a couple servers failed login checks. (A process does a login and logout for each server multiple times an hour.) These go to the servers directly. Another check comes in the front door like a regular user. It also was failing, which is super bad.

Project 365: Day 014

My first instinct was to find if there was a running process for our shutdown script. There was and I killed the process. Then I found the crontab entry that started this and removed it.

At this point there was a hard decision to make very fast:

    1. Recover this one.
    2. Make sure the others instances are not affected.

I ended up doing the latter. In retrospect, I guess I wanted to ensure I did not have multiple fires. If others were doing it too, then I would ask coworkers to help. If just the one, then I could handle it. And it was only a couple minutes to check by checking the dates in the crontab of certain hosts for the shutdown script. This one of the ten was the only one affected.

So I resumed the recovery. The first thing the shut down script does is flip a flag in a file that tells the load balancer whether to allow traffic to the servers. I reversed that first. Half the servers started picking up the traffic and ended the outage. Then I started up the 5 of 10 servers that had shut down.

From start of the outage to when users were back in was about 14 minutes.

Usage was pretty light because the term ended a few days prior.

Probably this was a holdover from doing upgrades the year prior. Crontab does not have year, just month/day or weekday. So we have to make sure we remove things targeted for a specific day. (Or start using at more.)

One of Many

The Learning Management System (LMS) has been a despised technology by some ever since I started working with one, WebCT, in 1999. At the time it was deemed crappy technology that had to improve or die. So today in 2012, about 13 years later, I have to roll my eyes at the pundits writing about how the current technology has not significantly changed in a decade (really more than a decade) because it still offers the same tools and will die unless it adapts.

My first few years, 2006-2010, of working at GeorgiaVIEW, our active user counts doubled every 1.5 years. We plateaued at around 290,000 and grow a few thousand a year. Numbers of actions in the system still doubles every 1.5 year. That is insane growth. Growth unlikely fueled by people despising use of the tool. Right now, we are getting pressure to migrate Summer 2012 content for the Fall 2012 start in Desire2Learn1 because instructors roll over the classes from term-to-term. That speaks of long term consistent loyal use not occasional only as little as have to use. For something on the verge of death, it is hard enough keeping the users happy.

I am a database administrator not a faculty member (or dean or vice president for academic affairs or provost). It seems to me though no one would say, “When you teach a class, the white board in the room is the only tool you can use.” Instead, the push would be to add to the available tools in a neverending pursuit of finding better ones. So we see pressures to integrate the LMS with a variety of similar specialized services. Many are textbook replacements or supplementary services designed specifically for student needs. Others are social media. More and more the LMS is just a portal: a place to organize where students really go to learn.

Also, as an IT guy, I think it is important to have a plan B. Things sometimes fail. As a student I was always annoyed when the instructor had to leave the room for 20% of the class to go track down a piece of chalk because the remaining ones were too small to write. I applauded once in my junior year because the instructor happened to have a piece of chalk in her purse just for that contingency. Similarly, faculty members and even students should think about what to do when the LMS is not there. Heck, what should they do if everything the university IT runs like the web sites, email, portal, and network all disappear. It can happen.

When the university bureaucracy selects and administrates a tool, they will adhere to university policy which adheres to higher education laws. When a faculty member selects and administrates a tool, they should do the same. Unfortunately, that means the faculty member becoming familiar with policy and law. Another challenge is running into different interpretations. An example: a user following @VSUENGL1101 on Twitter could be reasonably expected to be a student at Valdosta State University enrolled in the subject English class 1101. Some say that violates the Family Education Rights and Privacy Act. Some disagree, so it is being debated. The law is old and did not likely anticipate social media, so naturally there is movement towards an update.

I doubt the LMS will simply die because there is something better. Instead it will remain one of many tools for years to come. Like the land line, television, JavaScript, still camera, WiFi, non-smartphone, and (God forbid) pagers.

Note 1: Desire2Learn objects to their product being called an LMS. They prefer Learning Environment on the grounds it integrates with so many other tools.

P.S. This totally is from a sustaining technology perspective. Guess I should write this from a disruptive technology perspective.

Want to Work With Me?

There are a bunch of new positions which were just posted. We need analysts, database administrators, and an operating system / hardware specialist.

The list:

We have a great team. So you should come work with us.

Contact me if you are interested or want to know more. (Staff directory and search for ezra)

Curiosity

In Curiosity Is Critical to Academic Performance, curiosity was measured as a strong factor like conscientiousness and intelligence for academic success. Capacity and speed acquiring information, staying on task, and motivation to work with information are all good things. At the end of article, I found this interesting.

Employers may also want to take note: a curious person who likes to read books, travel the world, and go to museums may also enjoy and engage in learning new tasks on the job. “It’s easy to hire someone who has the done the job before and hence, knows how to work the role,” von Stumm says. “But it’s far more interesting to identify those people who have the greatest potential for development, i.e. the curious ones.”

For the members of my team curiosity is critical. We get the escalations of problems several layers of tiers below. Every problem we get should be something others found too challenging to solve or requiring information not available to them. Plus every problem requires informed decisions, meaning gathering data and determining that results are accurate. Expectations of the near impossible become the new normal every time we succeed. Plus delivering the near impossible usually means learning something new. These same academic performance factors help solving challenging problems.

Our interviews were designed to get a sense that candidates have enough relevant knowledge to be a foundation we can build upon and maybe some expertise the fill in our own gaps. Also, we ask questions about how someone worked on problems to get a sense that the candidate learned from past experiences and can find the information necessary to solve issues.

The technology landscape is constantly changing. Software upgrades mean things break or work in a new way. Leadership makes decisions which pull the rugs out from under us. Adapt or die. Curiosity is the only way to stay sane in such a world where what I know today may be irrelevant in a week.

Hack Education has a good post on the frustration of learning to code even with education startup Code Academy. Pretty sure I never would have learned to code without needing to accomplish something + curiosity. Of course, that is often the description of a geek.

Muzzled

For over a month now my team has been heads down to provide some sandbox environments for the University System of Georgia Learning Management System Transition Task Force. The evaluators are looking at a sandbox for each contender. Various technical teams are also determining how the product fits with our experience and our operations. Growing new skills and abilities is probably a good thing as long as it fits the organization.

My inclination is to blog about every discovery whether good, bad, or ugly. Yet this whole processes is overshadowed by fear. Fear of the loser initiating a lawsuit and anything I write being taken out of context to support a case is the main reason I have muzzled myself about it. Probably even private blog posts on a private blog could be requested by a subpoena.

I guess there will be plenty of time to gripe about it all when the decision is made and we surge towards meeting an absurdly short timeline to implement a production environment. That is a whole other blog post I probably should never write.

Aspiring to a Billion Pageviews

Read this in an article about Reddit,

But Reddit.com is still one of the internet’s most popular sites with over a billion pageviews a month.

I realize a billion is a big number, but I figured even GeorgiaVIEW could be getting half a billion pageviews a month. January 15th to February 14th (our peak 30 day period), we did about 774 million pageviews. In the last 30 days we did about 600 million while some kids are on spring break (sic?).

There must be online learning systems larger than us like University of Phoenix (500,000 students). If they are comparable to us in the amount of online usage they have, then they could be doing over a billion pageviews. UoP is about 60% larger than us, so they should cross over into the Reddit range.

Surely software as a service online learnings sites like Blackboard, Desire2Learn, Pearson Learning Studio, Unicon, and Moodlerooms push more than a billion pageviews?

Organization Relationships

A friend of mine who I used to work with once remarked (2007-ish) the University System of Georgia does not really work like a system so much as a loose confederation fighting over money. Given I have no access to budgets, I would not know. GeorgiaVIEW works remarkably well given there are only a few people running the system and hoards of people administrating it for their campus. There is a mostly correct mix of grassroots and top down pressure.

The Board of Regents Information Technology Services have fostered a culture of “help requests must go through the tickets”. Tickets allow the team to better triage issues. Tickets show leaders we are helpful. The unintended consequence is weakening the relationships we have. Tickets indicate we are too busy to be helpful. Relationships are accountable so an individual shows vulnerability to me by admitting not understanding, breaking, or other problems. My part of the relationship is to console, advise, or fix the problems. Tickets make all this harder because they are less personal.

When I talk with my coworkers, we covet the connections we hold across the system for they are the true value. How do we develop these relationships inside the formality of processes which fail to incentivise them?

We have email lists, instant messages, weekly Wimba sessions, etc., but there is obviously  a problem when the same people who have these things only tell me about things when they see me in person. I’m reminded of the ITS CIO spending time going to campuses to talk to them about their needs. Maybe that should something we do throughout the organization especially at my level? Also, when I was at Valdosta State, my best information about the needs of faculty members and students came from visiting them not the technology I developed to encourage reporting issues.

Technology is not magic. It does make those who are not communicating start. It just shifts the form and potentially makes it more difficult. Ideally the difficulty will be so slight no one will notice. One can make communication easier by going from a more difficult technology to a more easy form. Still… It is not as good as being there with the person.

Night School

I noticed a couple weeks back there are interesting spikes in the evening hours of Sunday through Wednesday. Just like morning/afternoon usage, the evening spikes diminish but even more so by comparison.

As I recall for Monday through Wednesday, when I first started, the evening traffic almost flatlined at 5pm and then dropped off at 11pm. Over time the spike has grown to the point we have more users active in the evening than during “business hours”.

In this graph, the numbers across the bottom are the week of the year. The numbers along the left side are the number of users active within the last 5 minutes.

Yaketystats

Really I have no data to say why the change in trend. (We are not 100% online and the majority of the classes we host are supplemental to face-to-face, with hybrid and totally online fighting for second place.) I hope the days of instructors teaching in a computer lab and having students follow along died a hard painful death. If so, then the amount of activity during the day would lessen some. Students and faculty would still go online during the day between classes. However, more student access to broadband at home would empower them to go online more often in the evening and increase the difference between day and evening user activity.

Identifying where each individual IP resides is hard. Doing so for many is more time than I would want to invest in the question. Campus vs. residential vs. corporate is relatively easy. However, “home” for a student could be on campus or residential. Maybe someone else knows better than me.

I guess this means we really ought to look at our automated operations which kick off at 10pm. WebCT recommended they be run when user activity is light or they could impact performance.