T-SQL to Enforce Proper Case Capitalization

Capitalize the First Letter of Each Word
Capitalize the First Letter of Each Word

This handy-dandy T-SQL query will take any input string and capitalize the start of each word within the string. It will also enforce proper use of lower-case throughout the string without affecting query performance too severely. This query assumes that you are working with titles, user names, or place names where you want to make sure that the first letter of each word is capitalized.

Overview of the Query

There is no in-built function in SQL Server to enforce proper case and capitalization in text that is queried. The best way to ensure capitalization of each word in a results string is to create your own user defined function. This holds as true in the latest versions of SQL Server as it did back in the days of SQL Server 2000.  In this example I have put together a simple Scalar-Valued function to help enforce basic capitalization rules are met. This should work just as well in SQL Server 2000 as it does in SQL Server 2016.

Since the lack of a system function to enforce proper capitalization has been a long standing issue, there are plenty of solutions out there. I wasn’t able to find one that was simple enough and performant enough, so for my own education and purposes, I put code together to be a quick-performing utility that follows a few simple rules:

  1. It will capitalize the start of each word that is preceded by a space, or non-letter character.
  2. It will make sure that the remainder of each word is in lower-case

This means that the code I’m giving below should properly capitalize words separated by spaces and also capitalize hyphenated words or words with apostrophes such as:

test words –> Test Words
jim o’mally –> Jim O’Mally
GUINEa-bissau –> Guinea-Bissau

Note that although I mainly tested this script with standard ASCII characters, in simple tests that I’ve run it does behave correctly with capitalizing various non-ASCII characters such as å or ü. In other words, this example should be robust enough to work even in real world situations with variable languages and symbols.

The Basic Premise of How it Works

The query is set up to be used as a SQL Server function that you can call as part of a select statement. You pass the column name of the VARCHAR type table column to the function and it will return each record formatted with the first letter of each word capitalized and the remainder of each word in lower case. There are a variety of different ways to do this, but the simplest is just to loop through each letter in the string of data that is to be parsed. As the code loops through the characters from left to right, it will look at the character in question and the character immediately preceding it.

If the preceding character is a non-letter character such as a space, hyphen, number, or single quote, then the query also has a look to check that the character under consideration is a letter character. If both these conditions are met, then I’d say 99% of the time it is justified to capitalize the letter under consideration. In the tests I’ve run this consistently holds true and the function is effective for setting proper capitalization for title text or proper names.

A Working Example

Here is an example script that you can paste into SSMS to kick the tires a bit. It is pre-set to changing a tricky test sentence:

tHE cat in the hat mET 1fRiend called jim o’mally in Guinea-Bissau

into a properly cased title:

The Cat In The Hat Met 1Friend Called Jim O’Mally In Guinea-Bissau

Here is the code that you can directly use in SSMS. You can change the value that the variable @InputString is getting set to in order to try input text of your own.

DECLARE @InputString VARCHAR (500)
DECLARE @OutputString VARCHAR (500)
DECLARE @Location INT
-- Test a sentence string in mixed case with a number, an apostrophe, and a hyphenated word
SET @InputString = 'tHE cat in the hat mET 1fRiend called jim o''mally in GUINEa-bissau'
SET @Location = 1
-- Pre-set to a character string if the input string exists and has a length. otherwise the out string remains a NULL
IF DATALENGTH(@InputString) > 0
BEGIN
SET @OutputString = ''
END

WHILE @Location <= DATALENGTH(@InputString)
BEGIN
DECLARE @CheckCharacter VARCHAR(1)
DECLARE @PrevCheckCharacter VARCHAR(1)
DECLARE @OutCharacter VARCHAR(1)

-- Set the current character to lower case in case a capitalization condition is not met.
SELECT @CheckCharacter = LOWER(SUBSTRING(@InputString, @Location, 1))
SELECT @PrevCheckCharacter = SUBSTRING(@InputString, @Location - 1, 1)

-- Set the current letter to uppercase if the preceeding letter is a non-letter character 
-- and the current character is a letter
IF @PrevCheckCharacter NOT LIKE '[a-z]' AND @CheckCharacter LIKE '[a-z]'
BEGIN
SELECT @OutCharacter = UPPER(@CheckCharacter)
END
ELSE
BEGIN
SELECT @OutCharacter = @CheckCharacter
END
SET @OutputString = @OutputString + @OutCharacter
SET @Location = @Location + 1
END
select @OutputString

The Code as a Working Function

Here is the script turned into a Scalar-Valued function called NormalizeCase. You can use it from a standard select statement in order to select a set of results with the first letter of each word capitalized:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION NormalizeCase 
(
@InputString VARCHAR (8000)
)
RETURNS varchar(8000)
AS
BEGIN
DECLARE @OutputString VARCHAR (500)
DECLARE @Location INT
SET @Location = 1
-- Pre-set to a character string if the input string exists and has a length. otherwise the out string remains a NULL
IF DATALENGTH(@InputString) > 0
BEGIN
SET @OutputString = ''
END

WHILE @Location <= DATALENGTH(@InputString)
BEGIN
DECLARE @CheckCharacter VARCHAR(1)
DECLARE @PrevCheckCharacter VARCHAR(1)
DECLARE @OutCharacter VARCHAR(1)

-- Set the current character to lower case in case a capitalization condition is not met.
SELECT @CheckCharacter = LOWER(SUBSTRING(@InputString, @Location, 1))
SELECT @PrevCheckCharacter = SUBSTRING(@InputString, @Location - 1, 1)

-- Set the current letter to uppercase if the preceeding letter is a non-letter character 
-- and the current character is a letter
IF @PrevCheckCharacter NOT LIKE '[a-z]' AND @CheckCharacter LIKE '[a-z]'
BEGIN
SELECT @OutCharacter = UPPER(@CheckCharacter)
END
ELSE
BEGIN
SELECT @OutCharacter = @CheckCharacter
END
SET @OutputString = @OutputString + @OutCharacter
SET @Location = @Location + 1
END

RETURN @OutputString

END
GO

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s