A More Useful Port of the Chinook Database to Postgresql

Image by Ingrid Talar | Some Rights Reserved

I use Chinook Database as the sample database for a lot of things, primarily because the same data set is available for most of the major database platforms, so it’s easy to port/compare etc. Chinook Database offers up a set of tables with enough sample data to satisfy most testing/demo needs, in a general enough schema that it is easy to get something up and running quickly.

However, the Chinook database as made available at the Chinook site maintains the same object naming conventions (proper-cased table and column names) across database platforms, and this I DON’T like. Also, the Postgres version of Chinook available as part of the Chinook package uses integer primary keys, but not auto-incrementing integers (this corresponds to the serial data type in Postgres).

Because I use Chinook fairly often with Postgres, I decided to make a few mods to the Chinook Postgres script to fix these things, and provide a more native PG experience when using this otherwise handy sample database.

Use Postgres-Idiomatic SQL

Postgresql (“Postgres” or “pg”) has its roots in the Unix world. Database object names are case-sensitive, and in fact the convention is to use lower-case names, and where needed, separate with underscores. It is possible to use proper-cased object names in Postges by escaping them with double-quotes. However, this makes for some atrocious-looking SQL.

A sample excerpted from the Chinook script Postgres perfectly illustrates this:

Excerpt from Standard Postgres Chinook Script:

CREATE TABLE "Album"
(
    "AlbumId" INT NOT NULL,
    "Title" VARCHAR(160) NOT NULL,
    "ArtistId" INT NOT NULL,
    CONSTRAINT "PK_Album" PRIMARY KEY  ("AlbumId")
);

The above forces the Album table to be created with proper-cased object names (the table name itself, as well as the columns). However, from this point forward you will need to use the double-quote escapes in any SQL you push in which requires the use of object names explicitly.

At a minimum, you would need to do THIS to pull all the records from the Album table:

Select all the Records from the Chinook DB Album Table:

SELECT * FROM "Album"

Note the quotes around the table name. If you don’t use those, you’ll get an error indicating that the table album doesn’t exist. This is because Postgres will automatically down-case incoming SQL before processing it.

Now let’s imagine you wanted to set some criteria on that SELECT statement:

Select some specific Records from the Chinook DB Album Table:

SELECT * FROM "Album" WHERE "ArtistId" = 2

Again with the double-quote escape sequence. Things continue to go downhill from there. Imagine a simple JOIN between two tables with a minimal set of columns returned:

Select Records from the Chinook DB Album Table using a JOIN to Artists:

SELECT 
  ar."Name", 
  al."Title"
FROM 
  "Album" AS al, 
  "Artist" AS ar
WHERE 
  al."ArtistId" = ar."ArtistId";

Yeah. Now things start to look ugly.

Composing SQL against this (and even worse, code which incorporates said SQL) gets painful fast.

Use Serial Integer Primary Keys

The maintainers of Chinook Database for some reason chose not to implement auto-incrementing integers for table primary keys in the Postgres version. Whatever the reason, I want auto-incrementing integer keys on the tables in my Chinook database.

Making this happen took a little doing, since the sample data itself establishes table relationships based on existing primary key data. How to solve this?

The Postgres serial type utilizes sequences to maintain the current value of a particular serial column. We can specify the starting value of a sequence, and the amount by which it is incremented, using the Postgres CREATE SEQUENCE command:

Create Sequence Example:

CREATE SEQUENCE "albums_id_seq"
 INCREMENT 1
 MINVALUE 1
 MAXVALUE 2147483647
 START 347
 CACHE 1;
SELECT setval('"public"."albums_id_seq"', 347, true);

But wait, John – you pluralized the name of the albums table there. Also, what’s with that setting the START value at 347?

Good question. We’ll get to the pluralized naming convention – I changed that for the entire Chinook script. As for the initial value, the Chinook sample data set contains an initial set of 347 album records. Therefore, we want to set an initial value for our table PK sequence at 347, so that the next ID provided will be 348.

Down-Casing Object Names, and Simplifying

I went through the Chinook script for Postgres, and modified the object names for all of the tables and columns, down-casing everything, and adding underscores as needed. I also pluralized the table names themselves. With that in mind, as well as the fact that we want to use a serial data type for our primary key columns, the create statement for our albums table might now look more like this:

Modified Create SQL for Chinook Album Table:

CREATE TABLE albums
(
    id int DEFAULT nextval('albums_id_seq'::regclass) NOT NULL,
    title VARCHAR(160) NOT NULL,
    artist_id INT NOT NULL,
    CONSTRAINT pk_albums PRIMARY KEY  (id)
);

Notice we have pluralized the table name, and also used a straight-forward id as the name for the primary key column, doing away with the problematic AlbumId (we could have used album_id, but why?). We set the default value for the id using the Postgres nextval() function, which will pull the next value based on the sequence we created unless a value is provided.

Get the Modified Chinook Script from Github

You can find the full, modified version of the Chinook database script at my Github repo. There are two versions, each on a separate branch.

Branch Master contains the full Chinook script, with down-cased/underscore-separated object names. However, table names have not been pluralized, and table primary key columns maintain the underscored version of the original column name. For example, AlbumID becomes simply album_id.

Branch pg_names contains the same database, but with table names pluralized, and table PK column names reduced to simply id. In other words, the table Album in the original Chinook DB is now albums, and the PK for albums is simply id.

See Something Amiss?

I am not a DBA, and it is possible I did something stupid in here. I like to think I know enough database to do those things I need to do as a developer. However, if you see something I did that might be accomplished better another way, please do sound off in the comments, shoot me a PR on Github, or shoot me an email at the address in the “About the Author” section.

John on Google CodeProject