Wednesday, December 06, 2006

Download Free Awk Book

Here is a link to download a free copy of the Gawk - Effective Programming Language book.

Awk Executables for Windows

Here are some free downloads of AWK that you can install on a windows machine

AT&T Original Awk
Other Awk Downloads

What is AWK?

Now that I am writing about AWK I thought I should start off with a descriptive definition of this language. (See full description here):

awk is a simple and elegant pattern scanning and processing language. It was created in late 70th of the last century. The name was composed from the initial letters of three original authors Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. It is commonly used as a command-line filter in pipes to reformat the output of other commands. It's the precursor and the main inspiration of Perl. Although originated in Unix it is available and widely used in Windows environment too. It takes two inputs: data file and command file. The command file can be absent and necessary commands can be passed as augments. As Ronald P. Loui aptly noted awk is very underappreciated language:

what language we use in our undergraduate AI programming class. That's understandable. We use GAWK. GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old pattern scanning language isn't even viewed as a programming language by most people. Like PERL and TCL, most prefer to view it as a "scripting language." It has no objects; it is
not functional; it does no built-in logic programming. Their surprise turns to puzzlement when I confide that (a) while the students are allowed to use any language they want; (b) with a single exception, the best work consistently results from those working in GAWK. (footnote: The exception was a PASCAL programmer who is now an NSF graduate fellow getting a Ph.D. in mathematics at Harvard.) Programmers in C, C++, and LISP haven't
even been close (we have not seen work in PROLOG or JAVA).

The main advantage of AWK is that unlike Perl and other "scripting monsters" that it is very slim without feature creep so characteristic of Perl and thus it can be very efficiently used with pipes. Also it has rather simple, clean syntax and like much heavier TCL can be used with C for "dual-language" implementions.



How to parse a CSV file in AWK

Okay, so I am not new to Awk - been using it for 3 years. One delimeter type I tend to stay away from is .csv files. I deal with a lot of address data so if you try to do the obvious thing and set the FS = "," you are going to run into a lot of incorrectly parsed fields as address data sometimes includes commas in the street address.

So when I am dealing with a .csv file it is going to have quotes around each field and then a comma between each field.

So then you might say, why not set up the Field Separator like this: FS = "/",/"" which translates into the delimiter looking like ",".

Great, but the first field $1 will be left with a quote in the beginning as awk was not programmed to understand anything differently.

So I am looking for a simple way to parse .csv files using awk.

Here is all I've found so far:

Handling CSV data and a complicated example.
Google Group on AWK CSV parsing
Another Thread related to AWK CSV parsing

From all of this the most workable solution so far is:

function setcsv(str, sep) {
gsub(/[\\"]"/, "\035", str)
while (match(str, /"[^"]*"/)) {
middle = substr(str, RSTART+1, RLENGTH-2)
gsub(sep,"\036", middle)
str = substr(str, 1, RSTART-1) middle substr(str, RSTART+RLENGTH)
}
if (index(str, "\"")) {
if ((getline) <= 0)
return 0
setcsv(str "\n" $0, sep)
}
else {
gsub("\035", "\"", str)
gsub(sep, "\034", str)
gsub("\036", sep, str)
$0 = str
return 1
}

}

Introduction to Awk

I'll be posting a bit more on AWK as this is a language I use all the time and I am constantly referring back to my MKS Toolkit manual for that certain syntax or to see how exactly that index command works again. Below are some useful links of AWK manuals or tutorials to get started using AWK.

An Awk Primer

Getting Started with Awk

The GNU Awk User's Guide

Awk is your friend



I'll get more into the FS command and try to overcome some of the annoyances I've had with it in my next posts.





powered by performancing firefox

Tuesday, December 05, 2006

Scientology Disaster Response Team Asking for Volunteers

Typhoon Durian struck the coast of the Philippines this past Thursday and already as many as 1,000 are presumed dead in the aftermath. The Scientology Disaster Response Team is activating any and all volunteers who wish to help aid those in need in the Philippines.

Here is a link to the Scientology Disaster Response website which gives directions on how to help whether this is through donations or volunteering yourself.

Friday, December 01, 2006

SQL Plus Delimited Output File

Ad hoc queries anyone?

Something that has always been a pain is when you need a fast report generated from your latest SQL script. You probably just resort to executing the query in TOAD or another query tool, and then saving it off to a file. That is about 5 mouse clicks too many and you just wish you could click a button to run your script and dump out the results to a nicely formatted text file. Using SQL Plus and a batch script, you can set up an ad hoc query tool for yourself that dumps out the results to a delimited output. This is actually quite simple:

You need three files, a) the SQL query you want to run, b) the SQLPlus script that will run your query and c) a batch program to run your SQLPlus script.

Here is how I did it:

a) queryOut.sql - This contains a query set up for pipe-delimited output. You can change the delimiter to whatever you wish. Here is a sample query -

SELECT field1 || '|' ||
field2 || '|' ||
field3
|| '|' ||
field4
FROM {table}
WHERE {criteria};
b) queryToFile.sql - This contains the SQL Plus script including formatting options so you can dump out a file with your results and no other fluff. Here are the settings I used -
set termout off
set feedback off
set verify off
set echo off
set pagesize 0
set linesize 200
set trimspool on
column dt new_Value mydate noprint
select to_char(sysdate, 'YYYYMMDD') dt from dual;

spool rpt&mydate..txt
@queryOut.sql;
spool off
exit;
c) runQuery.bat - This is the simple batch program that calls SQLPLUS with your username and password and the script to run. Here is what this would look like -
SQLPLUS /@ @queryToFile.sql
I set up the above in a directory for these ad hoc queries and then edit my queryOut.sql file and click on the runQuery.bat which I added to a toolbar. Try it out and let me know if you found this useful or if you have any better ideas...