Tag Archives: Case Sensitivity

Redshift Naming Standard and Case Sensitivity

Redshift Naming Standard and Case Sensitivity

Everyone knows that Amazon Redshift is based on PostgreSQL. Most people think Redshift and PostgreSQL have the same or similar naming case sensitivity of the identifiers (table names and column names).

But actually Redshift and PostgreSQL have a number of very important differences in the naming standard and case case sensitivity that you must be aware of as you work with Redshift.

Basically, in Redshift there are two types of Identifiers: Standard Identifiers and Double-quoted Identifiers. The identifier types are distinguished when they are created in the DDL script with or without the double-quote marks. The AWS document Names and Identifiers defines the naming rules for these two types of identifiers (table names and column names).

In general, in Redshift the UTF-8 multibyte characters two or four bytes long are allowed in the identifier names. In practice, let’s limit the identifier names to use single-byte ASCII characters only in this blog.

Standard Identifiers Rule

There are three simple rules for the Standard Identifiers:

  1. Begin with an alphabetic character or underscore, followed by any ASCII alphanumeric characters, underscores, or dollar signs.
  2. Not be a reserved SQL key word, like group, user.​
  3. Be between 1 and 127 characters in length.

The first rule implies that the identifier name should not contain any spaces or quotation marks.

Double-quoted Identifiers Rule

There are also three simple rules for the Double-quoted Identifiers:

  1. Begin and end with double-quote marks (“).
  2. The identifier can contain any ASCII printable characters, including the spaces and even the double-quote mark (“) itself.
  3. Be between 1 and 127 characters in length.

Notes:

  • You can use the reserved SQL key word like group, user, etc. as the table name or column name, but they have to be double-quoted.
  • You even can include the double-quote mark (“) itself in the table or column names. But it must be escaped by preceding with another double-quote mark (“).

Database Catalog and Reference of the Identifiers

After creating the objects with the standard Identifiers or double-quoted Identifiers or their mixture, how to use them?

Different databases have different ways to store the identifiers in the database catalog and reference them.

In Redshift, all ASCII letters in the identifiers (for both the standard and double-quoted identifiers) are case-insensitive and are folded to lowercase in the Redshift Catalog.

The identifiers can be referenced in the following simple and convenient ways:

  • For the Standard Identifiers, they can be referenced in any letter-case or mixture, with or without the double-quote marks.
  • For the Double-Quoted Identifiers,

    • If the quoted part meet the Standard Identifiers rules, then it can be referenced in any letter-case, with or without the paired double-quote marks. Just like the Standard Identifiers.
    • Otherwise, you must use the double quotation marks for every reference to that identifier, all ASCII letters can be any case, but the others should be exactly the same as it was defined.

Want to Return the Identifiers to Upper Case?

As said before, all ASCII letters in the identifiers (for both the standard and double-quoted identifiers) are are stored in the database catalog as lowercase . Therefore, in query results, all the column names are returned as lowercase by default. If you want to return all the column names in uppercase, set the describe_field_name_in_uppercase configuration parameter to true.

set describe_field_name_in_uppercase to on;
show describe_field_name_in_uppercase;

Then you will see all the return column names are in upper case. Remember, the column names in catalog are still in lowercase.

Examples

The following is a valid DDL to create a table

create table "TEst_TaBle1"(cOLuMN1 text, "coLuMn #2" text, "coLuMn ""3""" text);

Then table name and column names in the catalog will be

test_table1 (column1, column #2, column "3" )

The following queries are valid and will return the same result:

select   COLUMN1,  "COLumn #2",  "COLumn ""3""" from  TEst_TaBle1;
select "colUmn1",  "coLUmn #2",  "columN ""3""" from "teST_taBLE1";
select  column1,   "colUMN #2",  "COLumn ""3""" from "Test_taBLE1;