Convert Little-endian UTF-16 to ASCII

hacker screen
Photo by Markus Spiske on

I generated some text files working with Get-Acl Powershell, but I did not know how to get Powershell to do some advanced features. (Basically, I wanted to the Select-String to include the next 2 lines and see whether a specific group was in that list. And maybe some exclusions.) So, I copied the files over to my Linux home to check there.

The basic most grep? Nothing.

I used ls -l and confirmed they have data. I used less to confirm I can see it.

I copied a string and did a grep for it. Nothing.

I did a dos2unix. That didn’t fix it. Finally, I did:

file filename.txt

That revealed the files had types of:

  1. Original: Little-endian UTF-16 Unicode text, with CRLF line terminators
  2. dos2unix converted: Little-endian UTF-16 Unicode text

Basically, this told me that the dos2unix fixed one problem but not both. The “with CRLF line terminators” means that Windows and Unix have philosophical differences in how to format text lines.

Little-endian is a geeky homage to Gulliver’s travels. It has to do with which direction one encodes the bits. But, it isn’t really the big problem here. UTF-16 is the problem because apparently, I need it to be UTF-8 for grep to read it. So, the fix is to use an encoding converting:

iconv -f utf-16 -t utf-8 filename.txt > filename_new.txt

Listing Lists

A mini project is to hand over the course packages for the prior product to each of our clients. A good idea was to include a list of the files so down the road, if something is missing then, we can say this list in the ticket has what they received.

So I wrote this shell script to make the lists for me. (Well, really the analyst doing the hard work wanted to know if he should make the list. Told him I could really easily through Linux.) This is because I am talking about 385,528 courses and 37 targets. First step generates a list of the clients (schools) involved. Next, the path to where the files are stored have two subdirectories, so I pull them out of the path. The list generates with a find command stripping out the “./” at the beginning and writing the results to a file. Finally I check the size and number of lines in the file.

SCHOOLLIST=`find /${BASEDIR} -name bak`
SCHOOL=`pwd | awk -F\/ ‘{print $4}’`
CLUSTER=`pwd | awk -F\/ ‘{print $3}’`
find . -name “*.bak” | sed -e ‘s|^./||g’ > ${BASEDIR}/${CLUSTER}/${SCHOOL}/course_list_${SCHOOL}.txt
head /${BASEDIR}/${CLUSTER}/${SCHOOL}/course_list_${SCHOOL}.txt
ls -h /${BASEDIR}/*/*/course_list*
wc -l /${BASEDIR}/*/*/course_list*

Since each course is on its own line, I can compare these numbers to other known numbers of courses.

So nice to get the computer to work for me. Purely by hand this would have taken days. It took about half an hour to craft the core and make sure it looked right. Then another half hour for the loop to work right.

Of course, I need to figure out how to do this in Powershell. 🙂

Bad Guesses

Usually the best way to guess which technology product will be successful is to bet against the one I like. Betamax, Apple, Linux, Picasa.

So it surprises me Barnes & Noble are giving up on the Nook since I went Kindle. I’m not usually in this position.

Then again, I was on the brink of going webOS but went Android. Maybe though iOS will win out over Android and keep me on track.

Of course, this is post is confirmation bias. I’m sure if I really thought about it, then I could come up with lots of technology I like that predominate their markets over rivals.

Just Get Rid of Java

Apparently there are security flaws in the current version of Java allowing the installation of malicious software through web browsers unknown to the user. The known attacks using this flaw work on Windows, OSX, and Linux. According to Reuters:

Java was responsible for 50 percent of all cyber attacks last year in which hackers broke into computers by exploiting software bugs, according to Kaspersky. That was followed by Adobe Reader, which was involved in 28 percent of all incidents. Microsoft Windows and Internet Explorer were involved in about 3 percent of incidents, according to the survey.

The Department of Homeland Security recently said computer users should disable Java. At first this seems odd. The vulnerability in question is only in Java 7. So why not go back to Java 6? Well, Java 6 has vulnerabilities too, which is why DHS and others have recommended getting to 7. Also, starting in 7, the automatic upgrades are more aggressive. So going backwards is probably not a great idea. (If just happens I had to go backwards to get a tool I needed to work and forgot to go back forward.)

Also, for a similar situation back in August the recommendation was to make the browser prompt before allowing Java to run. The strategy is just stop Java entirely. Apple has removed Java browser plugins. That could work too. Except for bad, bad software like ours (sorry, sarcasm if you could not tell) which makes use of a few applets. In the last week I have gotten a request to add another applet.

A fix to Java 7’s vulnerabilties should be available in a couple days.


Tweetdeck is my primary interaction with Twitter. Managing two Twitter accounts would be annoying via the web (two browsers given Prism is dead). At times I do accidentally post under the wrong one. Though I think the solution to that might be not having two blue profile icons. It is not Tweetdeck’s fault I fail to pay attention.

The dot indicating an unread post helps me keep track. I can also clear read posts. These keep me from wasting time re-reading posts. Twitter web presents a count of new unread posts and requires me to click it to present the new one. Maybe if I were more of a dopamine addict, I would prefer it.

I follow the brand names of our clients and the relevant product names in my industry. Tweetdeck’s search columns make this easy.

In an ideal world, I would use the AIR app. However, it is no longer supported for the OS I use at home. Even then, there is a feature both the Chrome and Android apps have it does not.

    1. AIR app
      1. PRO: edit the columns in the title bar. (Why I switched back to AIR from Chrome app on one computer.)
      2. PRO: separate application from the web browser.
      3. CON: Adobe AIR is no longer available for Linux, which is the OS I use for about 25 hours a week.
      4. CON: If someone I follow mentions another user and I want to look at this other user, then I end up opening a browser to see their profile. Very clunky compared to the Chrome app. There is a setting “Open profiles in web page (saves on API calls)” that was set.
    2. Chrome app
      1. PRO: If someone I follow mentions another user, then I can see it in a client profile.
      2. CON: In a browser window.
      3. CON: Cannot edit columns in the title bar. Have to recreate the column with correct values.
    3. Android app
      1. PRO: If someone I follow mentions another user, then I can see it in a client profile.
      2. CON: Columns available to use are those from the Twitter web.

Really I am happy enough.

Apple Trying To Poach IE6 Users

Attempted to watch the Transformer’s 3 trailer, but apparently Chrome on Linux was a no-go for the JavaScript which hides the web site and displays the trailer. Fancy but broken. So I thought I would look at the HTML and get the .mov file. I found this snippet of code in the HTML quite interesting.

<!–[if lt IE 7]>
<div id=”ie6-message”>
<h2>You are currently using an outdated browser.</h2>
<p>Please upgrade to a <a href=””>modern browser</a> to fully experience this site.<p>

Where most places would have someone upgrade to a newer version of the software they are currently using, Apple is trying to poach Microsoft users. Bravo! Bravo!

Things We Can Live Without

Video Games ruined my life. Good thing I have two extra lives.

We have a tendency to over-exaggerate what will make us either happy or sad. That gadget, clothing, car, etc. probably will not turn the world into a Utopia. So I found this Profhacker post, “Things We Can Live Without” interesting.

I don’t mean annoying things we could do without, like complaints about grades or being stuck in traffic. I mean things that we thought we couldn’t live without but which it turns out we can. I mean things that held such great promise for happiness, completion, or freedom but which turn out to be useless, disappointing, or even enslaving.

Here is my list:

  1. Hybrid car – I rationalized at the time the battery life cycle was not well defined enough for me that going hybrid was that much better. My car gets pretty good miles per gallon. Enough so that with the same size tank of gasoline it can get from here to my hometown and back 1/3rd of the trip where I was used to having to fill up when I got home to even go visit a friend. However, I don’t care about the mpg as much as I thought I would while deciding on the purchase.
  2. iPod – Almost a year ago I bricked the Windows install on my home laptop and went Linux. My iTunes associated with the iPod was on that old Windows install. The hard drive still worked, so I tried moving the library to another Windows computer. Unfortunately the paths were different and Windows is not as good as Unix at symbolic links. I was still missing the only thing that mattered to me: ratings and play counts. So I tried to download the ratings off my iPod and fix it. That bricked the iPod. Rather than go out and buy another iPod, I made MP3 CDs to use in the car.
  3. eBook Reader – It seems so sexy to have my entire library on a tiny device. The reality is I have about 100 paper books to finish (growing faster than I read them) before I can consider switching over to another medium. Until all the bookstores switch to digital and prevent me from buying paper, the odds are low I’ll switch.
  4. Video game console – I used to play more video games than I watched television. However, for the past 4 years I have not had a console hooked up. (For a couple weeks in March-April 2006 I did have my Nintendo 64 hooked up, but I did not have cable or Internet at the time.) Ultimately, I no longer play as much as I used to play as the Internet, socializing, work, Netflix, and even television sap too much of my time.
  5. Home phone – When a company “had” to have my phone number they got my parents’ number even after I moved out on my own. Then I moved to another city. So I got my own home number thinking this is what I needed to interact with companies. Certainly I wanted to keep my cell phone pure and not get a dozen useless calls a week. Then along came Google Voice (especially the call screening specific numbers feature) to make it so much easier to appropriately handle corporations and political candidates who cannot email relevant information.

The list of things I probably should live without or at least reduce usage despite feeling obligated to do them. Thing is…. Will I?

  1. Television and movies
  2. Facebook, especially apps and memes
  3. Restaurant eating
  4. Eating meat
  5. Time spent online


This Linux tool is my new best friend. We get thousands of XML files from our clients for loading user, class, and enrollment information. Some of these clients customize our software or write their own software for generating the XML.

This means we frequently get oddities in the files which cause problems. Thankfully I am not the person who has to verify these files are good. I just get to answer the questions that person has about why a particular file failed to load.

The CE/Vista import process will stop if its validator finds invalid XML. Unfortunately, the error “An exception occurred while obtaining error messages.  See webct.log” doesn’t sound like invalid XML.

Usage is pretty simple:

xmllint –valid /path/to/file.xml | head

  1. If the file is valid, then the whole file is in the output.
  2. If there are warnings, then they precede the whole file.
  3. If there are errors, then only the errors are displayed.

I use head here because our files can be up to 15MB, so this prevents the whole file from going on the screen for the first two situations.

I discovered this in researching how to handle the first situation below. It came up again today. So this has been useful to catch errors in the client supplied files where the file failed to load.

1: parser error : XML declaration allowed only at the start of the document
 <?xml version=”1.0″ encoding=”UTF-8″?>

162: parser error : EntityRef: expecting ‘;’
<long>College of Engineering &amp&#059; CIS</long>

(Bolded the errors.) The number before the colon is the line number. The carat it uses to indicate where on the line an error occurred isn’t accurate, so I ignore it.

My hope is to get this integrated into our processes to validate these files before they are loaded and save ourselves headaches the next morning.

Google Chrome on Linux

I was excited to read today a Google Chrome Beta is now available on Linux. Gmail and Google Reader have weird font issues for me on both Linux and Window Firefox. So I tend split my browser load based on where the sites work best for me.

Making the Linux switch meant leaving Chrome behind unless I went for the unstable version. I was willing to wait for a beta. I just expected to wait a few more months. Whew.

So far so good!


When photographers I know talk about processing their digital images, they generally talk about Adobe products like Photoshop or Light Room. Some talk about Apple’s Aperture. Operating system only matters when it manages to make filters finish faster on the equivalent hardware.

Colorful Renee But… I am cheap. Photoshop was in my tool set back when work paid for me to do web design. Aperture and Light Room never entered it.

So I used Picasa as it did what I needed. Occasionally I used GIMP to perform more advanced edits. For example, I desaturated a custom area in the picture on the right to bring the attention back to who is important. Picasa can only do the same for a circle.

Considering GIMP is a image editor, it seemed quite concerning that it would fail to open Raw images. Surely Canon CR2 files from a 4 year old camera are supportable? Well, it turns out, GIMP needs help from a plugin.

  1. A dcraw-gimp plugin based on dcraw has very simple options for profiles used convert Raw to Portable Any Map for opening in GIMP.
  2. A ufraw-gimp plugin based on ufraw has much more cool tools for adjusting the levels prior to converting to Portable Any Map.

This morning I worked with F-Spot as my image manager and GIMP as the editor. This afternoon I switched to digiKam for the image manager and switched only to GIMP for things I could not manage.

I think I can use this workflow.