Regular Expressions in .NET to Surround but not Replace Text Matches

ASP.NET offers strong support for regular expressions in the System.Text.RegularExpressions Namespace. This is great since regular expressions are essential if you want to do anything beyond basic text manipulations and searches.

You can read more about regular expression support in ASP.NET on the official MSDN page. The MSDN site also includes a few good getting started examples if you are interested.

The Problem

Recently I found myself facing an interesting task: I was building a simple search form which consisted of a text area and submit button on a Web page. The desired behavior was as follows:

  1. Users could enter a partial term into the text area and then click the submit button to search for that partial text match.
  2. The codebehind would search through an MS SQL Server database to find the partial matches
  3. The codebehind would render the search results into a GridView control with the partial match text highlighted with a yellow background color

In theory putting this together is quite simple. The catch was figuring out how to highlight the partial text that the user searched for in the database results without changing the capitalization or other formatting.

For example if the user searched for the term EXAMPLe TexT (notice the crazy capitalization), then the data results should highlight the correct results in the search result output using the capitalization/formatting found in the database. Or more simply: the results should not depend on the user-entered search term, but should show the original text from the database surrounded by HTML markup tags. So no matter what strange capitalization the user entered, the search results should be properly capitalized and highlighted.

So if the user searched for the term: EXAMPLe TexT then I would want the database to do a case-insensitive search and I would expect search results along with HTML markup to appear as:

<p>Here is some <span style="background-color:yellow;">Example Text</span> displayed<p>

This is certainly possible but tedious to do using loops and substrings. In fact, in the past I’ve seen this exact problem being handled in this manner. However using loops/substrings ends up taking a lot of time to code/test and naturally doesn’t come with the best performance times.

A properly formed Regular Expression and ASP.NET text manipulation would definitely be in order here. AKA: using $&

Not surprisingly, the power of regular expressions in ASP.NET offers several alternate solutions for elegantly solving this problem.

Here is the first solution I was able to find after a bit of playing around with the Regex.Replace command:

Dim strFieldFromDb As String = "Here is some Example Text displayed. And here is some more example text to check."
Dim strUserInputVal As String = "EXAMPLe TexT"
strFieldFromDb = Regex.Replace(strFieldFromDb, strUserInputVal, "<span style=""background-color:#FFE97F;"">$&</span>", RegexOptions.IgnoreCase)

In the Regex.Replace statement above, the $& is the key. The term $& is considered a whole regex match.

To put the Replace statement into English:

  1. A case-insensitive search is done into strFieldFromDb (which is text from the database such as ‘Here is some Example Text displayed. And here is some more example text to check.‘).
  2. The variable strUserInputVal is the partial text entered by the user that should be matched but not replaced such as EXAMPLe TexT.
  3. The final parameter contains the markup that we would like to surround the partial text match with. In this final parameter, we tell the ASP.NET regular expression engine to match but not replace the text using the $& symbols.

You can find some great documentation about regular expressions on my favorite regular expression reference site: regular-expressions.info. This site has been around for many years and the reference material just keeps getting better.

If you want to test your expression you can use either the power of Visual Studio itself, or a neat tool I have been using for the past ten+ years called EditPad Lite.

Alternate Syntax: Using $0

Aside from the syntax I used earlier, I was able to find several alternate regular expression operations to perform the function of surrounding key text with HTML markup.

If you take a look at the permutations for writing whole regular expression matches, you will see that instead of using &$, you can also use $0 with .NET (Note: $1-$9 won’t work the same way since they are considered backreferences).

Here is the same example from above using $0:

Dim strFieldFromDb As String = "Here is some Example Text displayed. And here is some more example text to check."
Dim strUserInputVal As String = "EXAMPLe TexT"
strFieldFromDb = Regex.Replace(strFieldFromDb, strUserInputVal, "<span style=""background-color:#FFE97F;"">$0</span>", RegexOptions.IgnoreCase)

There is yet another option: using $+

The $+ syntax in ASP.NET regular expression codes for the highest numbered group/last acquired text in the regular expression so this syntax can also be used as follows:

Dim strFieldFromDb As String = "Here is some Example Text displayed. And here is some more example text to check."
Dim strUserInputVal As String = "EXAMPLe TexT"
strFieldFromDb = Regex.Replace(strFieldFromDb, strUserInputVal, "<span style=""background-color:#FFE97F;"">$+</span>", RegexOptions.IgnoreCase)

Using $1 also works

If you do prefer to use the backreference syntax, then you can make use of the $1, but note that you will need to enclose the search parameter in brackets for this to work. Here is an example:

Dim strFieldFromDb As String = "Here is some Example Text displayed."
Dim strUserInputVal As String = "EXAMPLe TexT"
strFieldFromDb = Regex.Replace(strFieldFromDb, "(" & strUserInputVal & ")", "<span style=""background-color:#FFE97F;"">$1</span>", RegexOptions.IgnoreCase)

Summary

So to sum up, there are many ways to use the .NET System.Text.RegularExpressions Namespace to accomplish the goal of surrounding but not replacing text. Each of these ways works very well and I confirmed in my tests that each way will encapsulate all text matches in a string rather than just the first or last match.

So take your pick of the syntax you prefer most and enjoy the power of regular expressions!

Advertisements

One thought on “Regular Expressions in .NET to Surround but not Replace Text Matches

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s