Working with Redshift Regular Expression Functions

The regular expression functions in Amazon Redshift offer many advantages such as identifying precise patterns of characters in the given string. You can use these functions to validate the input data. For e.g. validate if the input value is an integer. The regular expression can also help you to extract specific string or characters from input.

In the previous post, we have discussed Redshift NVL and NVL2 functions to deal with NULL values. In this article, we will check how to use Redshift RegEx functions in your SQL queries.

Redshift Regular Expression Functions

Following are the Amazon Redshift supported RegEx Functions:

REGEXP_COUNT
REGEXP_INSTR
REGEXP_REPLACE
REGEXP_SUBSTR

Now, lets us check these Redshift Regex functions with some examples.

Redshift REGEXP_COUNT Function

The Redshift REGEXP_COUNT function, searches a string for a regular expression pattern and returns an integer that indicates the number of times the pattern occurs in the string. The function will return 0 if no match found.

Following is the syntax of Redshift REGEXP_COUNT function.

REGEXP_COUNT ( source_string, pattern [, position [, parameters ] ] );

Where

source_string : A string expression, such as a column name, to be searched.
Pattern: A string literal that represents a SQL standard regular expression pattern.
Position: A positive integer that indicates the position within source_string to begin searching. It is number of characters
- The default is 1.
- If position is less than 1, the search begins at the first character of source_string.
- If position is greater than the number of characters in source_string, the result is 0.
Parameters: It is a string literals that indicate how the function matches the pattern. The possible values are the following:
- c – Perform case-sensitive matching. The default is to use case-sensitive matching.
- i – Perform case-insensitive matching.
- p – Interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect.

And following is the example.

select regexp_count('RedshiftdbcRedshiftabcRedshift', '[Redshift]{8}');
 regexp_count
--------------
            3
(1 row)

Redshift REGEXP_INSTR Function

The Redshift REGEXP_INSTR function searches a string for a regular expression pattern and returns an integer that indicates the starting position of the matched substring. This function returns 0 if no match found.

Following is the syntax of Redshift REGEXP_INSTR function.

REGEXP_INSTR ( source_string, pattern [, position [, occurrence] [, option [, parameters ] ] ] ] )

Where

source_string : A string expression, such as a column name, to be searched.
Pattern: Represents a SQL standard regular expression pattern.
Position: A positive integer that indicates the position within source_string to begin searching. It is number of characters
- The default is 1.
- If position is less than 1, the search begins at the first character of source_string.
- If position is greater than the number of characters in source_string, the result is 0.

Occurrence: It is a positive integer value that indicates which occurrence of the pattern to use. The default is 1. It will ignore and return 0 if value is less than or more then characters in source_string.
Option : Its value indicates how pattern position is returned.
- 0 return the position of the first character of the match. This is default value.
- 1 returns position of the first character following the end of the match.

Parameters: It is a string literals that indicate how the function matches the pattern. The possible values are the following:
- c – Perform case-sensitive matching. The default is to use case-sensitive matching.
- i – Perform case-insensitive matching.
- p – Interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect.
- e – Extract a substring using a subexpression.

And following is the example.

select regexp_instr('This is Redshift in-string example','Redshift', 1);
 regexp_instr
--------------
            9
(1 row)

Redshift REGEXP_REPLACE Function

The Redshift REGEXP_REPLACE function searches a string for a regular expression pattern and replaces every occurrence of the pattern with the specified replace_string.

Following is the syntax of Redshift REGEXP_REPLACE function.

REGEXP_REPLACE ( source_string, pattern [, replace_string [ , position [, parameters ] ] ] )

Where

source_string : A string expression, such as a column name, to be searched.
Pattern: A string literal that represents a SQL standard regular expression pattern.
replace_string : A string expression, that will replace each occurrence of pattern. The default is an empty string.
Position: A positive integer that indicates the position within source_string to begin searching. It is number of characters
- The default is 1.
- If position is less than 1, the search begins at the first character of source_string.
- If position is greater than the number of characters in source_string, the result is 0.

Parameters: It is a string literals that indicate how the function matches the pattern. The possible values are the following:
- c – Perform case-sensitive matching. The default is to use case-sensitive matching.
- i – Perform case-insensitive matching.
- p – Interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect.

And following is the example.

select regexp_replace( 'sample_email_id@gmail.com', '@.*\\.(com)');
 regexp_replace
-----------------
 sample_email_id
(1 row)

Redshift REGEXP_SUBSTR Function

The Redshift REGEXP_SUBSTR function returns the characters extracted from a string by searching for a regular expression pattern.

Following is the syntax of Redshift REGEXP_SUBSTR function.

REGEXP_SUBSTR ( source_string, pattern [, position [, occurrence [, parameters ] ] ] )

Where

source_string : A string expression, such as a column name, to be searched.
Pattern: Represents a SQL standard regular expression pattern.
Position: A positive integer that indicates the position within source_string to begin searching. It is number of characters
- The default is 1.
- If position is less than 1, the search begins at the first character of source_string.
- If position is greater than the number of characters in source_string, the result is empty string.

Occurrence: It is a positive integer value that indicates which occurrence of the pattern to use. The default is 1. It will ignore and empty string if value is less than or more then characters in source_string.

Parameters: It is a string literals that indicate how the function matches the pattern. The possible values are the following:
- c – Perform case-sensitive matching. The default is to use case-sensitive matching.
- i – Perform case-insensitive matching.
- p – Interpret the pattern with Perl Compatible Regular Expression (PCRE) dialect.
- e – Extract a substring using a subexpression.

And following is the example.

Select regexp_substr('Cake chocolate chip', 'ch(i|o)p', 1);
 regexp_substr
---------------
 chip
(1 row)