Sunday, December 31, 2006

Awk Tab Delimiter (FS="\t")

Quick guide to Awk Field Separators (FS):

Tab Delimited: FS="\t"
Comma Delimited: FS="," (Beware of CSV files. See earlier post on this.)
Pipe Delimited: FS="|"

Friday, December 29, 2006

CDI - Build or Buy?

Organizations with a central HQ and regional offices, must have some form of CDI in order to accurately understand their customer base. Otherwise you are left with disparate sets of disrelated data, unlinked and unconsolidated, making analysis and effective marketing nearly impossible. So once CDI as a topic is understood, the next question becomes - Build or Buy? After travelling down the "build" path for 2 years I wish I had researched and implemented a commercial product just to avoid the headaches alone...

However, with all that water under the bridge, it would be nice to be able to simply "plug-in" the missing parts to make my CDI solution complete.

I recommend buying a solution outright, even though the cost may be overwhelming at first, it will pay off if your organization is doing any volume of mailing - and will probably pay itself back in postage saved alone.

Here are the key questions that should be asked of a CDI vendor if you are looking to buy:

* Since systems rarely share identifiers, does the solution score and match data taking advantage of all attributes that aid in the matching process using likelihood statistical theory for the highest levels of accuracy? State-of-the-art systems use "probabilistic algorithms" to compare attributes one by one and produce a weighted score for each pair of records. The higher the score, the more likely the records represent the same person. This method improves matching accuracy by using weights that are specific to an organization's data.

* Does the vendor provide analysis based on your own real-world data? Premier providers conduct frequency-based file analysis to provide weighting and thresholds specific to a customer's data. By using your own data, a better match is possible than by using generic criteria and arbitrary weights for each attribute. The data can be studied and tuned to recognize that a match on John Smith in the Bronx, NY, contains a more trusted matching value than a match on John Smith in Minnesota.

* Does the provider offer a way to capture and maintain a complete history of changes to attribute values? Such a method of "complete versioning" improves accuracy and has the ability to make the correct decisions on linking data even as it constantly changes due to marriage, divorce, new address, new phone number, etc.

* Does the solution enable you to establish the right accuracy and identification levels for your application and budget? Managing data quality is a business decision, so consider whether you need automated identification of results to ensure faster customer service, for example, or a solution that allows manual review of data to capture fraudulent customer activity, such as duplicate insurance claims.

* Can the solution overcome multiple identifications and data discrepancies across sources? Common obstacles to customer identity include transpositions, misspellings, nicknames, aliases, address inconsistencies and identity misrepresentations, all of which can occur when customer data flows into the company through multiple touchpoints. A strong CDI product should be able to keep data up to date and synchronized across the enterprise.

* Can you use the same matching technology in order to maintain all your relationships: for example, individuals, households and companies?
(read rest of article here)


How to create that Golden Master

The concept of a centralized view or 360 degree view of a customer is often strived for but the "name, want and get" is not always figured out. Most companies will name what they are looking for, some fall short in getting the board to want the solution and many lose it on figuring out how to get it done.

VP of Siperian describes the need for assigning reliability factors to the different sources of data as some sources may have more accurate data:

In order to deliver a “golden” or master record for each customer and its various affiliations, a system must dynamically assess reliability across all data sources – based on user-defined parameters and intrinsic data properties – and ensure that only the most reliable content survives at the cell-level of the master record. For instance, if the call center begins collecting email addresses when confirming orders, this data attribute may be more reliable than the email addresses submitted by customers at the website. The ability to rapidly adjust the system to survive the call center email address over the website email address is a critical architectural component of any CDI system. Moreover, such cell-level survivorship for data reliability must be built into the core product architecture and should not be sacrificed as the customer hub scales to millions of customers. Ultimately, how well the end-users accept a customer data hub depends on sustaining high level of data reliability, even as the hub grows in volume or as new data sources are added.
In looking into different CDI solutions, you must look into how the data hub is structured, if they have configurable rules for loading and merging and if they have a robust Data Steward. Realize that data stewardship is the way all 'exceptions' get handled. If you do not separate out the confident updates from the questionable ones, you will have a poor foundation that will eventually lead to a polluted database.

A business cannot implement an operational customer hub in the absence of data stewardship ― as soon as data begins to flow through the supported business processes, exceptions and errors begin to flow as well. Therefore, any customer hub acting as a data integration platform must offer business capabilities to monitor and handle such exceptions either by business analysts and end-users or by anyone designated as a data steward.
...
Often managing such exceptions requires full user interfaces for complex data stewardship tasks such as investigating the history and lineage of certain master records in question. This may be the only way to ensure that the user acceptance and data hub reliability remains high. In other circumstances, an enterprise may choose to build a specific user interface against the programming interfaces of the master hub in order to suit its needs. In either case, an adaptive solution must deliver rules-based configurability with best-in-class data stewardship consoles as well as programming interfaces to handle all data exception and reliability needs.
I've worked on custom CDI projects before CDI was called CDI and can attest that the Data Stewardship interface will make or break your CDI project. This should not be underestimated.

The last point data export back to original sources. This is where you exchange your centralized view with the sources and provide them with your expanded view to assist them in better understanding your customers. If you have done a good job, they will love you, if you haven't you will be public enemy number one. They think their data is "better than yours", "why did you change my address", "you merged two customers into the same one!" and other rantings will be sent your way if you have not done a good job.

You should have a well thought out strategy including what data elements you will send back to sources and you should allow them to decide whether or not they want your data.

Often, certain data attributes (such as privacy preferences) need central control and exception handling whereas other attributes are best left under local management. In addition, security and access to data attributes in the hub will vary by individual roles within each organization and by organization at large. In fact, to support the broad range of business requirements across business lines, there may be multiple data governance regimes required for different data attributes, all within a single enterprise.

An adaptive approach must be based on a distributed architecture whereby multiple hubs can be deployed to integrate different data sources and support different processes, yet be able to share data across one another based on any number of hub-and-spoke or peer-to-peer governance regimes. This offers a line of business yet another dimension of flexibility to share some but not all data – each based on its own data reliability and governance policies. With full rules-based configurability and data stewardship interfaces, a broad range of data governance regimes can be supported.

In summary, data must be the central focus of any CDI project, and data reliability, stewardship and governance regime must never be an after-thought of a comprehensive CDI solution. Enterprises must closely review the details of a vendor’s offering in this area to reduce their risk of project failure and the total cost of CDI implementation.



Friday, December 22, 2006

Customer Data Integration (CDI) Can't Decipher It

I'm drowning in the sea of Customer Data Integration solutions, Data Librarians, Data Stewards, Registry hubs, Persistant hubs or Hybirds. Above is a good review of the basics of the problem domain. Key players are the Oracle Customer Data Hub, Siperian, Purisma and possibly the recent Sun aquisition - SeeBeyond.



"Oracle leads in customer data integration. Oracle has assembled the broadest set of components, from data quality, to matching, to customer key management, to process integration, to ongoing monitoring and management."—Forrester Research

Customer knowledge is the lifeblood of the enterprise. And customer knowledge begins with a unified, reconciled source of high-quality customer data—provided only by the Oracle Customer Data Hub.
Oracle Customer Data Hub is a fully integrated customer data management solution that centralizes, de-duplicates, and enriches your customer data, continuously synchronizing with all your data sources, to give you a single view of your customers. As new data comes in, reporting accuracy grows, analytics become more valuable, employee productivity increases, and day-to-day customer relationships improve.
Reading Jill Dyche's new book, Customer Data Integration is my life preserver. Talking to salespeople has led to further confusion and contradictory statements about each others products.

Gartner's findings of May 2006 on CDI may already be outdated with the fastly evolving market.


Thursday, December 21, 2006

World's Oldest Computer - Antikythera Mechanism


Ancient computer found off the coast of Antikythera, Greece is estimated to have been built over 2000 years ago. This makes it the world's oldest known computing device.

Above is a reconstruction of this device is believed to look like. There are over two thousand greek characters inscribed on this machine which helped archeologists decipher what it did and how it worked.

From a recent article in All Headlines News:

"...a device known as the Antikythera (an-tee-KITH-air-uh) mechanism, a device consisting of thirty bronze gears with approximately two-hundred-twenty-five teeth, was discovered off the Greek Island of Antikythera.

The Antikythera mechanism, which was made in Greece, is the earliest known computer, though research has shown that the mechanism was not unique for its time. Other instruments similar to the Antikythera have been mentioned in historical documents written in first-century B.C. by Cicero and Posidonius, the teacher of Cicero.

Scientists believe the Antikythera mechanism may have been used in the field of astronomy, showing the motion of the planets and computing eclipses of the sun and moon."

Here is a photo of what one of the pieces of this computer actually look like:


Other Antikythera links:

Reconstuction of Antikythera Mechanism
Antikythera Animation of gears
Pictures of mechanism
An Ancient Greek Computer
Oldest Known Computer

Monday, December 18, 2006

Oracle SQL Function - INSTR()

I found it. The use of INSTR() simulates the AWK Function INDEX() and does a search of the first argument to see if the second argument exists and then returns the position where this is first found. Here is the syntax:

INSTR(str, chars[,s[,n]])
Find the 'n'th occurence of 'chars' in 'str'
Starting at position 's'
n and s default to 1
This was decribed in the following page which lists out all the Oracle SQL Functions and is a very useful link:

Oracle SQL Functions

To see a full listing of all Oracle SQL Functions you can run the following query:
SELECT distinct object_name
FROM all_arguments
WHERE package_name = 'STANDARD';

Looking for Index() function in Oracle SQL

From my blog you probably can tell I have an infatuation with Awk. Well today I was trying to work out a problem in Oracle SQL where the index() function would have worked perfectly. I'm checking around to find if there is any built in function that searches an existing column for a string and returns the starting point if it exists. Oracle SQL does have the substr() function which I've found quite useful...

Wednesday, December 06, 2006

Download Free Awk Book

Here is a link to download a free copy of the Gawk - Effective Programming Language book.

Awk Executables for Windows

Here are some free downloads of AWK that you can install on a windows machine

AT&T Original Awk
Other Awk Downloads

What is AWK?

Now that I am writing about AWK I thought I should start off with a descriptive definition of this language. (See full description here):

awk is a simple and elegant pattern scanning and processing language. It was created in late 70th of the last century. The name was composed from the initial letters of three original authors Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. It is commonly used as a command-line filter in pipes to reformat the output of other commands. It's the precursor and the main inspiration of Perl. Although originated in Unix it is available and widely used in Windows environment too. It takes two inputs: data file and command file. The command file can be absent and necessary commands can be passed as augments. As Ronald P. Loui aptly noted awk is very underappreciated language:

what language we use in our undergraduate AI programming class. That's understandable. We use GAWK. GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old pattern scanning language isn't even viewed as a programming language by most people. Like PERL and TCL, most prefer to view it as a "scripting language." It has no objects; it is
not functional; it does no built-in logic programming. Their surprise turns to puzzlement when I confide that (a) while the students are allowed to use any language they want; (b) with a single exception, the best work consistently results from those working in GAWK. (footnote: The exception was a PASCAL programmer who is now an NSF graduate fellow getting a Ph.D. in mathematics at Harvard.) Programmers in C, C++, and LISP haven't
even been close (we have not seen work in PROLOG or JAVA).

The main advantage of AWK is that unlike Perl and other "scripting monsters" that it is very slim without feature creep so characteristic of Perl and thus it can be very efficiently used with pipes. Also it has rather simple, clean syntax and like much heavier TCL can be used with C for "dual-language" implementions.



How to parse a CSV file in AWK

Okay, so I am not new to Awk - been using it for 3 years. One delimeter type I tend to stay away from is .csv files. I deal with a lot of address data so if you try to do the obvious thing and set the FS = "," you are going to run into a lot of incorrectly parsed fields as address data sometimes includes commas in the street address.

So when I am dealing with a .csv file it is going to have quotes around each field and then a comma between each field.

So then you might say, why not set up the Field Separator like this: FS = "/",/"" which translates into the delimiter looking like ",".

Great, but the first field $1 will be left with a quote in the beginning as awk was not programmed to understand anything differently.

So I am looking for a simple way to parse .csv files using awk.

Here is all I've found so far:

Handling CSV data and a complicated example.
Google Group on AWK CSV parsing
Another Thread related to AWK CSV parsing

From all of this the most workable solution so far is:

function setcsv(str, sep) {
gsub(/[\\"]"/, "\035", str)
while (match(str, /"[^"]*"/)) {
middle = substr(str, RSTART+1, RLENGTH-2)
gsub(sep,"\036", middle)
str = substr(str, 1, RSTART-1) middle substr(str, RSTART+RLENGTH)
}
if (index(str, "\"")) {
if ((getline) <= 0)
return 0
setcsv(str "\n" $0, sep)
}
else {
gsub("\035", "\"", str)
gsub(sep, "\034", str)
gsub("\036", sep, str)
$0 = str
return 1
}

}

Introduction to Awk

I'll be posting a bit more on AWK as this is a language I use all the time and I am constantly referring back to my MKS Toolkit manual for that certain syntax or to see how exactly that index command works again. Below are some useful links of AWK manuals or tutorials to get started using AWK.

An Awk Primer

Getting Started with Awk

The GNU Awk User's Guide

Awk is your friend



I'll get more into the FS command and try to overcome some of the annoyances I've had with it in my next posts.





powered by performancing firefox

Tuesday, December 05, 2006

Scientology Disaster Response Team Asking for Volunteers

Typhoon Durian struck the coast of the Philippines this past Thursday and already as many as 1,000 are presumed dead in the aftermath. The Scientology Disaster Response Team is activating any and all volunteers who wish to help aid those in need in the Philippines.

Here is a link to the Scientology Disaster Response website which gives directions on how to help whether this is through donations or volunteering yourself.

Friday, December 01, 2006

SQL Plus Delimited Output File

Ad hoc queries anyone?

Something that has always been a pain is when you need a fast report generated from your latest SQL script. You probably just resort to executing the query in TOAD or another query tool, and then saving it off to a file. That is about 5 mouse clicks too many and you just wish you could click a button to run your script and dump out the results to a nicely formatted text file. Using SQL Plus and a batch script, you can set up an ad hoc query tool for yourself that dumps out the results to a delimited output. This is actually quite simple:

You need three files, a) the SQL query you want to run, b) the SQLPlus script that will run your query and c) a batch program to run your SQLPlus script.

Here is how I did it:

a) queryOut.sql - This contains a query set up for pipe-delimited output. You can change the delimiter to whatever you wish. Here is a sample query -

SELECT field1 || '|' ||
field2 || '|' ||
field3
|| '|' ||
field4
FROM {table}
WHERE {criteria};
b) queryToFile.sql - This contains the SQL Plus script including formatting options so you can dump out a file with your results and no other fluff. Here are the settings I used -
set termout off
set feedback off
set verify off
set echo off
set pagesize 0
set linesize 200
set trimspool on
column dt new_Value mydate noprint
select to_char(sysdate, 'YYYYMMDD') dt from dual;

spool rpt&mydate..txt
@queryOut.sql;
spool off
exit;
c) runQuery.bat - This is the simple batch program that calls SQLPLUS with your username and password and the script to run. Here is what this would look like -
SQLPLUS /@ @queryToFile.sql
I set up the above in a directory for these ad hoc queries and then edit my queryOut.sql file and click on the runQuery.bat which I added to a toolbar. Try it out and let me know if you found this useful or if you have any better ideas...

Friday, November 24, 2006

How to use Decode in Oracle SQL

The use of the decode function in Oracle SQL is pretty straightforward. You simply specify the column that you need conditional logic on and then specify the conditions as follows:

SELECT DECODE (value,
<if this value>, <return this value>,
< if this value>, <return this value>,
....)
FROM dual;


If you have a default value that you want to display if any of the conditions are not met, you can add another comma and specify the default value like this:

SELECT DECODE (value,
if this value, return this value,
if this value, return this value,
otherwise this value
FROM dual;



For a thorough description of all features of this DECODE function, see the Puget Sound Oracle User's Group description.


Here are some further examples from another post on my blog:

Most Oracle SQL users do not realize the power of this function. Hopefully this post will enlighten some on it.

Tuesday, November 21, 2006

What is DPV (Delivery Point Validation)?

Mailers today who do not use a DPV service are either really out of the loop or they don't really care whether their mail actually arrives or not.



I've been in the category of "really out of the loop" until recently when I became aware of the pitfalls of not using a DPV service to ensure your mailing address is deliverable.



Let's say you process your mailing list through a standardization service without DPV and then try to merge all duplicates together in your database. If you do not validate down to DPV level, you may be losing address data and names from your list . DPV validation will tell you that the mail will actually get delivered to a delivery POINT which other validation levels can not do. ZIP4 encoding can only give you the general range of zip codes and tell you whether or not these are valid ranges, but for all you know you may be mailing to an empty lot or a torn down building, not to mention the high rises that won't deliver your mail without that apartment number.



So mailers who are interested in keeping their lists in good shape should choose some form of DPV solution. I am looking into the PostalSoft validation which is described on their website:



Delivery Point Validation (DPV)

By adding delivery point
validation to your data cleansing process, you can bring data
validation to finer precision. A DPV solution:

  • Determines whether a particular address — a specific house number,
    apartment number, or suite number — is known to the USPS (United States
    Postal Service) as a valid point for delivering mail, which helps
    reduce mail-order fraud.
  • Identifies whether an address is a Commercial Mail Receiving Agency
    (CMRA), such as The UPS Store, which can reduce fraudulent credit card
    orders.
  • Increases the accuracy of matched records and helps create a more
    accurate view of each customer, resulting in more sophisticated
    marketing campaigns, such as loyalty programs, or more highly
    personalized offers.




powered by performancing firefox

ZIP Code Maps

Here is a basic ZIP Code map showing the different state ZIP codes and ranges. I've been looking for a good ZIP Code map for a while now and the above is pretty straightforward. While this doesn't give you much in terms of detail, it is a good overview.


















Next is a great link to a ZIP code distribution map which is an interactive Java applet where you can see the exact location of each ZIP Code range or down to the specific zip code itself.



Wednesday, November 15, 2006

Jigloo for Eclipse?

I've been using Eclipse now for a few months. Originally I started out with the Jigloo form builder tool which I found quite useful. From an earlier post:

Building GUIs for rich client apps in Eclipse is not the easiest thing in the world. I'm looking for windows builders comparable to the Net Beans GUI builder.

I am downloading the Jigloo GUI builder which sounds pretty good but that is coming from company who made it.

What about WindowsBuilderPro? This is a commercial product, not free like Jigloo. How is it better?

I'll try both and see for myself...

On the install, Jigloo is very simple and is just like any other Eclipse plugins. Just copy to the features and plugins directories and your done. WindowBuilder on the other hand has you walk through a install wizard which took only a couple minutes to run through. (Make sure you don't have Eclipse running during the install).

WindowsBuilder has several example apps you can try. In 5 minutes of testing this builder I found it very user friendly. I liked the "quick view" feature where without even compiling you can get a feel for how your app will look during runtime. Now on to Jigloo.

After the few minutes I found it easy to create a basic app. Nothing difficult. I liked the two screen (code/app) view that shows the two-way changes so if you edit the GUI the code changes or you edit the code and the GUI changes.

So far they are both comparable products and I'll spend a bit more time using each on an actual application to see which is the better product.
However, once I found my feet in making Swing applications, I found using a GUI builder to be more of a pain in the neck than a help. I see these GUI builders like training wheels when learning to ride your bike. It is really not that difficult once you get the hang of it, and those training wheels sure get annoying when trying to speed down that big hill...

My Scientology Wedding




I was glad to hear that Tom Cruise picked the same Scientology Wedding ceremony that my wife and I chose when we got married 5 years ago. This was the Double Ring ceremony and during this the minister holds up two rings and asks you to picture the ARC Triangle in the middle as this represents Affinity, Reality and Communication - a vital part of a lasting marraige. A secret to a succesful Scientology marraige is that you have continuously create it. From the Scientology website on weddings:

"When someone begins on that arrangement called marriage, he is getting into something which is, to say the least, adventurous. When a couple get married, they are doing something they know nothing about. And, from all indications, when they have tried it more than once, they know no more about it the second time than they did the first." ...
"Where people are having trouble with marriage, it is because they are expecting it to run on automatic. They think it will hang together through no effort of their own; unfortunately, it won’t. It has to be created."
One part of the wedding vows is that you agree not to let a day dawn if there is a break in that ARC, in other words, if you have an upset, don't let it sit, but instead bring it up with the spouse and resolve the break in Affinity, Reality or Communication before closing out the day.

"Scientology founder L. Ron Hubbard isolated the three elements that make up Understanding: A is Affinity—which is the degree of closeness, liking or affection one has for something or someone; R is Reality—which is what we agree on; and C is Communication—which is of course the interchange of ideas between two people.

If you increase any one of these points in your dealing with another person you will have greater understanding with that person. Understanding is, of course, everything in a successful relationship.

Thursday, November 09, 2006

NVL - Oracle SQL Null Field Handling

An often overlooked function that comes in very handy is the NVL function in Oracle. This allows you to replace a query result that returned NULL, with a value, such as 0. Here is the syntax:

SELECT NVL(fielda, 0) FROM table;

This will always give you either a value or a "0". This will never return a NULL.

Here is a link for further details on this function:

NVL Function

Oracle Wildcards

When searching using wildcards in Oracle SQL the most common use is the "%" character which searches for any number of characters. However a useful but less used wildcard is the "_" character which searches for any single character. These can be used in combination to form complex querying.

If you need to query for something that has a wildcard character in it, use the ESCAPE syntax as follows:

SELECT * FROM table WHERE field LIKE '%\%%' ESCAPE '\'

This will treat the '%' as a literal instead of as a wildcard.

Here is another good oracle sql resource:

Puget Sound Oracle Users Group

Monday, November 06, 2006

Blogger Sitemap for Google

If you are trying to create a sitemap of one of your blogs for google here is how you can do it:

Step # 6: You can add a Sitemap to your account to provide Google additional information about about your blog. Google will process your Sitemap and provide information on any errors in the Sitemaps tab as well your sitemap will be downloaded everyday to index your blog fast.
Select type as : Add General Web sitemap

Now you need to add Atom 0.3 feeds. Generally, you would use this format only if your site already has a syndication feed and this is the only way to add sitemap to your blogger.com account.

Paste url of your Atom feed: For example http://cyberciti.blogspot.com/atom.xml and click on Add Web sitemap:

Update: If you are using a new blogger beta system, use http://yourblogname.blogspot.com/rss.xml

sitemap

You will get confirmation:
You have added a Sitemap to http://cyberciti.blogspot.com/. Reports may take several hours to update. Thank you for your patience!

And you are done and your blog will be now index very fast (depend upon your posting and content).

Sunday, November 05, 2006

How to Rebuild Java from Class File? Use a Decompiler!

I ran into a situation where I couldn't find the latest source for a project I was working on. All I had was the executable jar file which was the latest version. I searched through my computer for the source code but couldn't find it anywhere. I must not have saved the last version or it get accidentally deleted. Then I discovered Java Decompilers and after searching for a few seconds on the web I found jad.exe - a simple decompiler that reads a compiled CLASS file and converts this into a java source file. Granted it is not perfect but through using this I was able to recover the work done and recreate my source file (through using jad as well as winmerge).

You can download jad here.
You can download winmerge here.

Friday, November 03, 2006

Grand Opening of Scientology London

A fellow coder from United Kingdom couldn't be reached last week as he was attending the grand opening of London's new Church of Scientology building. He is an old school programmer working mainly in Fox-Pro!?! He attended the opening and I heard it that even the rain couldn't stop the people from flooding into the place. See the photo below with the umbrella's!

Scientology restored two famous buildings in London recently and just held the grand opening of these two new beautiful buildings.

LONDON — More than 3,000 members and friends of the Church of Scientology of London packed Queen Victoria Street in the heart of London for the sensational grand opening of the new home of the UK’s oldest Scientology church.

Steps away from the Tate Gallery and St. Paul’s Cathedral, the beautifully restored historic building at 146 Queen Victoria Street now serves the growing membership of London-area Scientologists. Its opening marks the largest expansion for Scientology in the Church’s 50-year history in London.
...

“This new church packs the full body of L Ron Hubbard’s technologies as derived from the greater well of Scientology knowledge,” said Mr. Miscavige, “and, as such, it provides the full gamut of Scientology activities for the betterment of mankind.”

Fanfare and umbrellas at the opening of the new Church of Scientology of London.

- Photos of the Grand Opening on the Scientology London site
At the grand opening Mr. David Miscavige spoke and gave the opening address.
- Mr. David Miscavige, Official Biography

Thursday, November 02, 2006

How to Delete a Column from a Table in Oracle SQL

Here is the syntax to delete one column from an existing table in Oracle SQL. As this is not used very often I am noting it here for quick reference.

     ALTER TABLE employee DROP COLUMN vacationPay;
Pretty simple.

Wednesday, November 01, 2006

How to Select Top Results in a Group By

The other day I ran into an interesting problem with Oracle SQL. I used to use Microsoft SQL Server and last year switched over to Oracle and had to adjust to the peculiarities of Oracle SQL. I had a report that gave the top 10 results of an aggregate query. In SQL this was simple and looked something like this:

SET ROWCOUNT 10
SELECT field1, field2, COUNT(*)
FROM table
GROUP BY field1, field2
ORDER BY COUNT(*) DESC
But when trying to do this in Oracle SQL with no "ROWCOUNT" feature, I was stumped at first. Then I came up with the following which is a bit of a hack, but works:
SELECT field1, field2, CNT
FROM (
SELECT field1, field2, COUNT(*) AS CNT
FROM table
GROUP BY field1, field2
ORDER BY COUNT(*) DESC
)
WHERE ROWNUM <= 10;
If anyone knows of a more standard approach, let me know.

Thursday, October 26, 2006

Oracle SQL Tips

I often am in the middle of some off-the-cuff query and run into one of those technical "gotchas" like a failed compare between two text strings due to trailing spaces or that alter table syntax to modify a column name.

Instead of googling for the answer each time, I've consolidated the websites that I use to find the answers to these common SQL questions and have listed these below for my own and others reference.

Oracle FAQ
Oracle Cheat Sheet
Oracle User Guide

These are the most frequently used resources. I may add to this later but the above will solve most all of your Oracle SQL questions.

Wednesday, October 25, 2006

Formatting Query Results

I wanted to format a query result so it displayed as currency. To do this you need to use the TO_CHAR function with the following syntax:

SELECT TO_CHAR(A_NUMBER, 'L999,999,999.99')
FROM TABLE;

This gives you a formatted output in the local currency format.

How to Use Variables in PL/SQL or SQLPlus Scripts

I am researching how to set up my SQLPlus script to be able to output a text report using a generic script that reads in a variable.

I want to set variable on the command line and then have the script reference this throughout, giving me customized counts based on the variable entered. The results should output to a text file that can then be used as the actual report. I am familiar with how to do this in a PL/SQL script but not in a regular SQL script that I would use in SQL Plus.

The following link helped me with the setting up of my variable.

Detail on how to use variables in SQLPlus

Specifically here were the instructions:

9.1 Setting a Substitution Variable's Value

A substitution variable can be set in several ways. The common ways are given below.

1. The DEFINE command sets an explicit value:

define myv = 'King'

This creates a character variable "myv" set to the value "King".

2. The ACCEPT command:

accept myv char prompt 'Enter a last name: '

prompts you for a value and creates a character variable "myv" set to the text you enter.

3. Using "&&" before an undefined variable prompts you for a value and uses that value in the statement:

select first_name from employees where last_name = '&&myuser';

If the substitution variable "myuser" is not already defined then this statement creates "myuser" and sets it to the value you enter.

4. Using COLUMN NEW_VALUE to set a substitution variable to a value stored in the database:

column last_name new_value mynv
select last_name from employees where employee_id = 100;

This creates a substitution variable "mynv" set to the value in the "last_name" column.



When outputting to a file, I like to use include the date in the file name:


Using SYSDATE you can query the current date and put it in a substitution variable. The substitution variable can then be used in a SPOOL command:

column dcol new_value mydate noprint
select to_char(sysdate,'YYYYMMDD') dcol from dual;

spool &mydate.report.txt
-- my report goes here
select last_name from employees;
spool off


Be sure to use the double period after the variable name in your file name:

If SET CONCAT is a period (.) and you want to append a period immediately after a substitution variable then use two periods together. For example:

define mycity = Melbourne
spool &mycity..log

is the same as:

spool Melbourne.log


Use the variable that you set like this:

define dept = '60'
ttitle left 'Salaries for department &dept'
select last_name, salary from employees where department_id = &dept;


Or pass in the variable on the command line like this:

You can pass parameters on the command line to a SQL*Plus script:

sqlplus hr/my_password @myscript.html employees "De Haan"

They can be referenced in the script using "&1" and "&2". For example, myscript.sql could be:

set verify off
select employee_id from &1 where last_name = '&2';


That handled my main concern and gives me a nice text output file that uses my generic script with a variable.

The next point I sought to resolve was how to have my text output be in a delimited format as I plan to read this from an Excel file.

Here is what I found to do this:

--sets sqlplus so output will be tab-delimited
set colsep ' ' /* that's a tab */
SET LINESIZE 3000
SET PAGESIZE 50000
SET ECHO OFF
SET FEEDBACK OFF
--SET HEADING OFF
set trimspool on
set newpage none
set underline off


Here is someone else's point of view on how this should be done.
Digital Voice

Instead of either of the above a simple handling would be to hard code into the generic script "|" between the columns. This would give me a standard pipe-delimited output.

That sounds like the easiest method and is what I commonly use.

Tuesday, October 24, 2006

Scientology Public Service Announcements

There are a number of public service announcements that have recently been shown on TV stations around the world. These PSAs are about Human Rights and the basic rights that all men have as based on the International Declaration of Human Rights first put forth over 50 years ago. Watch these and see what your human rights are.

Friday, October 06, 2006

Area Code Maps

I have recently been looking into how to track phone numbers and the constantly changing area codes. A great resource was found called NANPA. Still not sure what this stands for but these guys are the ones who set the phone numbers, so I would say it is a reliable source. This is the best area code map site I have seen as well.

Area Code Map

Tuesday, October 03, 2006

Free Java Books

Here is a link to a massive resource of free on-line Java books.

Free On-Line Java Books

Monday, October 02, 2006

Eclipse Plugins

I am diving head first into the world of Eclipse plugins. The first breakthrough was to clear up the concept of what a Plug-in really was. I went through the cheat sheets in Eclipse to get somewhat familiar with them and then realized this is exactly what I need.

The Eclipse Plugin headquarters can be found here.

Friday, September 29, 2006

Hibernate Tutorials

In my first Java database application I used straight JDBC calls to a simple Access database. This was good for practice but now I need a more Object related database approach and Hibernate seems like the way to go.

Here are some recommended Hibernate Tutorials that explain the basics of setting up a Hibernate project.

Hibernate Introduction
Hibernate Tutorial
Hibernate Tips & Tricks
Getting Started with Hibernate

If you are subscribed with ACM (Association of Computing Machinery) or have a Safari account there are some good Hibernate books that can be read online as well.

Wednesday, September 27, 2006

Eclipse vs Netbeans?

NetBeans 6.0M3 vs. Eclipse 3.3M2 by ZDNet's Ed Burnette -- Last week Sun and the Eclipse Foundation both released milestone versions of their flagship Java IDE/rich client platforms. This planetary alignment provides a unique opportunity to compare the state of development of these two open source projects....

Velocity in Java

Velocity is a very useful template tool. I use it with my mailing list software to prepare the jobfiles based on user input through a GUI. Each of the variables are set through the GUI and then the text jobfile is created using Velocity.

Here are some good links to get going with Velocity:

Java World - Velocity
Apache Jakarta - Velocity

Tuesday, September 26, 2006

Rename Column in Oracle SQL / PLSQL

To rename a column in an existing table use the following:

ALTER TABLE tablename RENAME COLUMN oldcolumn TO newcolumn;

Monday, September 25, 2006

Java and Study Technology

Learning Java is easy. That is if you know the secret to learning. I did a course called the Student Hat which is a Scientology course that teaches you how to study and ever since then, I've been able to learn ANY subject.

I was reading up on the difference between Abstraction and Interfaces today and was having a difficult time grasping it. Then I realized I didn't know what the word "Abstraction" meant. So I looked up this word, used it in sentences until I felt good about it and then carried on reading. The rest of the material made total sense and was not difficult to grasp any longer.

The secret of learning any subject is found in the Three Barriers to Study. There are several courses that you can do at any Scientology organization, starting with the Basic Study Manual or Learning How to Learn, all the way up to the Student Hat course and Key To Life course.

Doing these courses you can arm yourself with the skills to learn any subject, literally.

Sunday, September 24, 2006

Swing Events - Introduction

Here is a brief overview of how events work in Swing. This is from the IBM Swing Beginner Tutorial and is an important fundamental when learning Swing:

Events

Finally, we get to one of the most important parts of Swing: dealing with events and reacting to interaction with the UI. Swing handles events by using the event/listener model. This model works by allowing certain classes to register for events from a component. This class that registers for events is called a listener, because it waits for events to occur from the component and then takes an action when that happens. The component itself knows how to "fire" events (that is, it knows the types of interaction it can generate and how to let the listeners know when that interaction happens). It communicates this interaction with events, classes that contain information about the interaction.

With the technical babble aside, let's look at some examples of how events work in Swing. I'll start with the simplest example, a JButton and printing out "Hello" on the console when it is pressed.

The JButton knows when it is pressed; this is handled internally, and there's no code needed to handle that. However, the listener needs to register to receive that event from the JButton so you can print out "Hello." The listener class does this by implementing the listener interface and then calling addActionListener() on the JButton.

JTable Defined

I am going through the beginner Swing tutorial put out by IBM and here is a good description of the JTable:

JTable

Think of an Excel spreadsheet when you think of a JTable and that should give you a clear picture of what the JTable does in Swing. It shares many of the same characteristics: cells, rows, columns, moving columns, and hiding columns. The JTable takes the idea of a JList a step further. Instead of displaying data in one column, it displays it in multiple columns. Let's use a person as an example. A JList would only be able to display one property of a person -- his or her name for instance. A JTable, however, would be able to display multiple properties -- a name, an age, an address, etc. The JTable is the Swing component that allows you to provide the most information about your data.

Unfortunately, as a trade-off, it is also notoriously the most difficult Swing component to tackle. Many UI developers have gotten headaches trying to learn every detail of a JTable. I hope to save you from that here, and just get the ball rolling with your JTable knowledge.

Many of the same concepts in JLists extend to JTables as well, including the idea of different selection intervals, for example. But the one-row idea of a JList changes to the cell structure of a JTable. This means you have different ways to make these selections in JTables, as columns, rows, or individual cells.

Blogger Categories and the lack thereof

I've been blogging for about a month now, not really gotten the hang of it. I've made many posts to my main blog "The Paradigm Shift" and received over 3,000 vistors so far. The only problem is that there are no categories in my blog and now it is unusable for me. I made the blog originally as a storage of helpful items and tips that I would need to refer back to.

I just ran across something now in Java that I worked out a solution for in the past, (How to check if a date is valid in Java) and posted to my blog, but now it is buried in archives somewhere. It took more than a few seconds to go back and find it.

Now Beta Blogger came out with the new blogger categories and I am trying this out with this new blog. I heard that soon Blogger will allow you to convert your existing blog over to the new blog format with categories, but I didn't want to wait so thus, this blog.