Wednesday, January 6, 2010

.NET Framework Regular Expression Object Model

The Regular Expression Engine

The regular expression engine in the .NET Framework is represented by the Regex class. The class provides sets of instance and static methods to achieve similar purposes. Regex class will be discussed more later.

Regular Expression Object Model includes some important classes. The Match class inherits from Group class which in turn inherits from the Capture class. Therefore both Match and Group are captures but in different sense. Match.value (value is inherited from Capture) reflects the whole matched string, but Group.value reflects the captured string for the group. The Match.Groups[0] is a special group. It always exists and represents the entire matched stirng that equals to Match.value.Other groups may exist if grouping is defined in the pattern. Subsequent Group[>0].value will contain only string that matches the pattern defined for the group.A group has more than 1 capture only when it has a quantifier defined.

You can call the methods of the Regex class to perform the following operations:
  • Determine whether a string matches a regular expression pattern.
  • Extract a single match or the first match.
  • Extract all matches.
  • Replace a matched substring.
  • Split a single string into an array of strings.
Determine whether a string matches a regular expression pattern
Sample: Regex.IsMatch("abcabc abcabce", @"(abc){2}e?")
This returns true because there is at lease one match found in the input string.

Extract a single match or the first match
Sample: Match match = Regex.Match("abcabc abcabce", @"(abc){2}e?")
This returns the first match found in the input string. The match.value in the sample returns "abcabc". In this case, match.groups[0].value = "abcabc", and match.groups[1].value = "abc".

Extract all matches
Sample: MatchCollection matches = Regex.Matches("abcabc abcabce", @"(abc){2}e?")
This returns all matches found in the input string. The matches[0].value in the sample returns "abcabc" and matches[1].value in the sample returns "abcabce".


Replace a matched substring
Sample: Regex.Replace("abcabc abcabce", @"(abc){2}e?", @"$1xyz")
This returns the input string with the mathed substrings replaced by the replacement string which, in this case, is "abcxyz". The result is "abcxyz abcxyz".
You can also use Match.Result("replace string") to replace matched string in a single match. Regex.Match("abc", "(?<ab>ab)c").Result("${ab}de") will return "abde".

Split a single string into an array of strings
Sample: Regex.Matches("abcabc abcabce", @"[\w\s]").
This will return an array of matches consisting of 14 characters and essentially parsing the string into a character array.

2 comments:

  1. The relationships and nuances between matches, groups, and captures was one of the things that really confused me when I first started dabbling in .NET regex.

    I recently wrote an online .net regex tester that specifically displays these things together, to help other programmers see the relationship. If you're interested, it's at http://regexstorm.net .

    ReplyDelete
  2. Hi lonekorean,
    I built a local windows form with similar features for testing Regex. Yours looks cool and it sure comes handy when I need it. Thanks for the link.

    ReplyDelete