A security guy at a campus wanted our web server log file in the CSV format. The original file has lines which look something like:
machine.usg.edu: webserver.log13646,2010-11-30 11:08:32 0.0010 999.999.999.999 b7tPM1hTgGYMn90bLTM1 200 GET /webct/urw/lc987189066271.tp1333853785371/blank.html – 262 “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4” username:0:0
Turns out I only need three sed edits to make it look the way I want:
sed ‘s|:2009-|,2009-|g’ testfile.txt | sed ‘s|\t|,|g’ | sed ‘s|: |,|g’
The first converts the colon between the end of the file name and the year into a comma. The second converts all the tabs into commas, and the last changes the colon-space between the host name and webserver.log into a comma.
Easy enough. That line from the web server log now looks like:
machine.usg.edu,webserver.log13646,2010-11-30,11:08:32,0.0010,999.999.999.999,b7tPM1hTgGYMn90bLTM1,200,GET, /webct/urw/lc987189066271.tp1333853785371/blank.html,-,262, “Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4”,username:0:0
I love regular expressions.
I have a feeling I’ll need to make a primer for this guy too. 🙁
Hostname,Log Name, Date, Time, Seconds to Process, Load Balancer IP, Session ID, HTTP Response Code, HTTP Method, URI, URI Parameters, Bytes Returned, User Agent, Username:Transactions Read:Transaction Written