Staged Data Comparison Scripts for ExecSQL

These execsql scripts generate several types of a SQL SELECT statement that compares data in a staging table to data in a corresponding base table of the database. The scripts use the information_schema views to identify primary key and non-key (attribute) columns of the base table, so the scripts will work on any table of any database without modification.

Staged data comparison scripts are available for the following DBMSs:

  • PostgreSQL

  • MariaDB/MySQL

  • SQL Server

Usage

The steps to follow when using these scripts are:

  1. Create and populate a staging table that has all of the primary key columns and at least one attribute column of the corresponding base table.

  2. Include the appropriate ‘compare’ script (e.g., pg_compare.sql, md_compare.sql, or ss_compare.sql) into the main script, e.g., using execsql’s IMPORT metacommand.

  3. Call the desired ‘compare’ script(s) using execsql’s EXECUTE SCRIPT metacommand, passing all required arguments and optionally including either the “include_cols” or “exclude_cols” argument. If the generated SQL is to be stored in a private substitution variable, rather than a public variable, the private variable must be initialized prior to calling the script.

  4. Use the SQL that is generated and stored in the specified substitution variable to summarize the data as desired.

The SQL statements that are generated by these scripts are not terminated with a semicolon. Consequently, they can be used in subqueries.

These scripts create temporary tables and/or views (depending on the DBMS). All of these temporary objects have the prefix cmp_. If the application that calls these scripts contains or uses tables or views with the same prefix, there is a potential for conflict. If the database’s tables or views use this prefix, there is a potential for data loss.

Because MariaDB and MySQL do not support temporary views, the ‘temporary’ views that the scripts create for these DBMSs are created as permanent database objects, and then are dropped when the scripts terminate normally. If the scripts terminate abnormally (e.g., because of a data type mismatch between base and staging tables), those ‘temporary’ views will remain in the database. They will be removed the next time that thte scripts terminate normally, however.

An Example

The following code illustrates the use of the comparison scripts using a table that contains the set of menu items offered by a restaurant. This code is written for Postgres.

The base and staging tables for the menu items are created with the same structure, but the staging table is created in a schema named “staging”.

create table public.menu_item (
        menu_item text not null,
        item_category text not null,
        item_price numeric(8,2) not null,
        constraint pk_menuitem primary key (menu_item)
        );

create table staging.menu_item (
        menu_item text not null,
        item_category text not null,
        item_price numeric(8,2) not null,
        constraint pk_menuitem primary key (menu_item)
        );

The base table has some existing entries:

insert into menu_item
        (menu_item, item_category, item_price)
values
        ('Coffee', 'Drinks', 1.50),
        ('Tea', 'Drinks', 1.50),
        ('Reuben', 'Entree', 7.99),
        ('Hamburger', 'Entree', 7.99),
        ('Cheesebuger', 'Entree', 8.49),
        ('House salad', 'Side', 3.99),
        ('Onion rings', 'Side', 3.99),
        ('Cole slaw', 'Side', 3.99),
        ('Cobb salad', 'Entree', 6.49),
        ('Shrimp cocktail', 'Appetizer', 5.99),
        ('Oysters (4)', 'Appetizer', 5.99),
        ('Shepherd''s pie', 'Dessert', 9.99),
        ('Fish and chips', 'Entree', 13.99),
        ('Salmon', 'Entree', 15.99),
        ('Mahi mahi', 'Entree', 18.99),
        ('Apple pie', 'Dessert', 4.50),
        ('Ice cream', 'Dessert', 3.99)
        ;

The changes to the set of menu items are in the staging table prior to being merged:

insert into staging.menu_item
        (menu_item, item_category, item_price)
values
        ('Shrimp cocktail', 'Appetizer', 6.49),
        ('Oysters (4)', 'Appetizer', 6.49),
        ('Fish and chips', 'Entree', 14.99),
        ('Salmon', 'Entree', 16.99),
        ('Mahi mahi', 'Entree', 20.99),
        ('Shepherd''s pie', 'Entree', 9.99),
        ('French dip', 'Entree', 11.49),
        ('Cheesebuger', 'Entree', 8.49),
        ('Bacon burger', 'Entree', 8.99),
        ('Philly cheese steak', 'Entree', 12.99),
        ('Ameglian ribeye', 'Entree', 42.00),
        ('Spaghetti Bolognese', 'Entree', 8.99),
        ('Gyro', 'Entree', 9.99)
        ;

To compare the data in the base and staging tables, the ‘compare’ script is included into the main script and then each of the comparison routines is run and the result of each is displayed in a dialog.

-- !x! include pg_compare.sql

-- COMPARE_COMMON
-- !x! execute script compare_common with (base_schema=public, staging=staging, table=menu_item, sql_var=cmn_sql)
create or replace temporary view cmn as !!cmn_sql!!;
-- !x! prompt message "COMPARE_COMMON" display cmn

-- COMPARE_CHANGES
-- !x! execute script compare_changes with (base_schema=public, staging=staging, table=menu_item, sql_var=changes_sql)
create or replace temporary view changes as !!changes_sql!!;
-- !x! prompt message "COMPARE_CHANGES" display changes

-- COMPARE_COMMON_VALS
-- !x! execute script compare_common_vals with (base_schema=public, staging=staging, table=menu_item, sql_var=vals_sql)
create or replace temporary view vals as !!vals_sql!!;
-- !x! prompt message "COMPARE_COMMON_VALS" display vals

-- COMPARE_CHANGED
-- !x! sub_empty ~changed_sql
-- !x! execute script compare_changed with (base_schema=public, staging=staging, table=menu_item, sql_var=+changed_sql)
create or replace temporary view changed as !!~changed_sql!!;
-- !x! prompt message "COMPARE_CHANGED" display changed

The four dialogs produced by these commands are shown below.

The COMPARE_COMMON script generates SQL that includes all values of the base table’s primary key that are present in both the base and staging tables, and columns for each attribute indicating whether the value for that attribute is the same or different in the two tables.

_images/compare_common.png

The COMPARE_CHANGES script generates SQL that includes all values of the base table’s primary key that are present in the staging table, and a column indicating whether any of the attributes in the staging table are different from those in the base table.

_images/compare_changes.png

The COMPARE_COMMON_VALS script generates SQL that includes all values of the base table’s primary key that are present in both the base and staging tables, and two columns for each attribute that show the old value (from the base table) and the new value (from the staging table).

_images/compare_common_vals.png

The COMPARE_CHANGED script generates SQL that includes all values of the base table’s primary key that are present in the base table, and a column indicating whether any of the attributes in the base table are different in the staging table. This example (in the code above) uses a local variable to capture the generated SQL.

_images/compare_changed.png

Limitations

Following are a few limitations on the conditions under which these scripts can be used.

  1. The staging table must have all of the primary key columns of the base table, and those columns must be populated.

  2. Table names and column names should not require double-quoting.

  3. The DBMS’s case-folding behavior should be respected when specifying table and column names because the scripts use the information_schema tables, and if the case of the names provided do not match the case as stored in the information_schema tables by the DBMS, the scripts will not work correctly.