yping in a text box is the most common means by which programs collect information from a user. Whether collecting a date stamp, a product key, or a phone number, specific pieces of information often have restrictions on their form.
For example, a person typing in a text box may choose to enter their work phone in a number of different ways:
Same as Home
Each of these is meaningful to human eyes, but if your system needs to use this information in some way to, for example, sort, search, or power an auto-dialer, a consistent format is needed.
The most common way of enforcing a format in text is with post-input validation. That is, when the user finishes entering text, the system checks to see if the input conforms to an expected pattern. If it doesn't then the user is notified with a message resembling:
"Please use the following format to enter your number: (###) ###-####"
And this reactive approach works, technically speaking, but at the price of the user experience.
Edit mask controls allow the system to perform validation proactively, while the user is typing. If a keyed-in character doesn't comply with an expected value (the control's allowable "mask"), then it is ignored. In the previous phone number example, letters would be discarded so input such as "Same as Home" is never a possibility. As an additional advantage, edit mask controls are typically drop-and-go, requiring no additional coding. Instead, the input constraints they perform are defined by a notation. So one control could enforce a phone number pattern, then enforce a product key pattern, then enforce a weather code pattern, all by changing its "edit mask" notation.
There are a number of edit mask controls out there and all vary in intricacy. Some are quite simplistic in nature, supporting a limited list of pre-defined masks, while other are considerably less basic. On a scale from basic to sophisticated, the notation utilized in Coding Monk edit masks leans more heavily toward the sophistication side of the scale; however, this does not necessarily mean that the notation is more complex out of the gate. Simple needs mean simple notations; as edit mask needs grow in complexity, so do the edit mask patterns.
This document serves the dual purposes of reference and tutorial, defining the core notation, explaining how to use it, then finally giving examples of common edit mask patterns.
This document is distributed in a variety of media formats. If you are not viewing this online from the CodingMonk site and want to ensure the latest version, it can be found at the following URL:
Quick Reference: Reserved Notation Characters
The following table lists the reserved characters used in Coding Monk's edit mask notation. In later sections, we'll explain in more detail about each of these; however, after the nuances of the notation have been digested, this tables can serve as a place to refresh your memory. Note that any character not listed here is interpreted as a literal.
A numeric digit.
Any letter converted to lowercase.
Any letter converted to uppercase.
Any letter, regardless of case.
Any alphanumeric character.
Any alphanumeric character, converted to lowercase.
Any alphanumeric character; converted to uppercase.
Any character at all.
Begins a grouping of mask characters.
Begins a validation expression.
Closes the grouping or validation expression.
Used to define ranges of characters within groupings.
Used alone outside of a group to identify the start of a validation range.
Used to indicate that the next character should "escape" its reserved status and be interpreted as a literal character.
Where n is a number, this notation causes the preceding expression to be repeated, appearing a total of n times
Edit Mask Construction
Edit mask expressions can be complex if the edit mask requirements are complex. That said, most uses of an edit mask are basic and the more complex masks are few and far between. By the end of this document, you'll have all the tools you need to make sophisticated masks. But first, we'll cover some basic concepts.
Literals and Constants
As mentioned above, any character that is not listed in the Reserved Notation Characters table is a literal. This just means that these characters have no special meaning to the edit mask. The letter "A" simply represents the letter "A". A literal character specified by itself, which is to say an "ungrouped" literal, defines the only possible character in a given position. We call this lone-literal a constant since it cannot be changed.
The following expression demonstrates seven constants. Notice that positioning literals next to each other does not group them. They are still constants, just sequential ones:
In the above mask notation, the first position allows only the lowercase letter 'a', the second position only the letter 'b', the third 'c', and so on. Since each of the character positions associated with this mask allow but one possible character, they are typically filled in and locked before input begins. The user cannot change these constants. Clearly, this expression serves no purpose but that of example. For true user input, we rely on groups and wildcards.
Groups and Ranges
Groups are characters delimited by brackets ([…]) and are used to define options against which a single character can be matched. To demonstrate, let us modify our previous example by wrapping it in brackets and making it a group:
Unlike our previous example which defined seven input positions accepting one value each, this example defines a single input position accepting one of seven possible values. That is, this grouping defines a character position into which the user may type one of the letters 'a', 'b', 'c', 'd', 'e', 'f', or 'g'.
As a convenience, within groups the hyphen (or "dash") also acts as a reserved character used to represent a range of sequential alternatives. Using this technique, our previous example could have been defined as:
It is also possible to intermingle ranges with other character options within brackets, so that:
which defines a single position allowing one of nine different characters. Note that the hyphen's special status as a reserved character is limited to groups. Outside of groupings, the hyphen is considered a constant.
Now for a glimpse forward: Brackets can also define validations. Brackets with an asterisk as the first character in their contained sequence ([*…]) are validation patterns. These are wholly different than the groupings described in this section and are mentioned only because they are delimited in a similar way (with brackets). For more on this subject see "Validation Expressions a.k.a. Regular Expressions" later in this guide.
Wildcards are devices used to avoid some of the tedium associated with building common groupings. There are eight special characters that are classified as wildcards. We listed them in the quick reference above, but we'll cover them again here with a little more context:
# - A numeric digit. This is equivalent to:  (or [0-9]).
< - Any letter converted to lowercase; similar to [a-z].
> - Any letter converted to uppercase, similar to [A-Z].
& - Any letter, regardless of case ([a-zA-Z]).
@ - Any alphanumeric character; manually constructed as: [a-zA-Z0-9]
! - Any alphanumeric character; converted to lowercase. This is similar to [a-z0-9] except that capitol letters are converted rather than discarded.
^ - Any alphanumeric character; converted to uppercase. Similar to [A-Z0-9]except that lowercase letters are converted rather than discarded.
_ - Any character at all. If constructed manually, this would be a very large group indeed.
Wildcards can stand on their own, looking for all the world like a constant, but actually acting as group in its own right. Wildcards can also be embedded within proper groups, coupled together with literals, ranges, and other wildcards. For example, a four-digit hexadecimal number can be enforced with the following:
which could be expanded to:
The backslash (\) is also referred to as the escape character. It is used to "escape" the special meanings of all reserved characters (including itself). For example, the following expression:
will accept any one of the twelve characters: 'a', 'b', 'c', '0', '1', '2', '3', '4', '5', '6', '7', '8', or '9'.
A similar expression with a slash "escaping" the wildcard ("\#") changes things:
The above notation signifies one of only four characters: 'a', 'b', 'c', or '#'.
When repeating a complex character pattern a few times, or even a simple pattern several times, it is often useful to specify a repeat value. This is accomplished by wrapping a numerical value in braces immediately following the character pattern to be duplicated. By doing so, the previous example of accepting a four-digit hexadecimal number can be reduced to:
There are very specific rules which govern repeat values. Most of these are intuitive but we state them here in the interest of clarity:
- Repeat values apply to the complete pattern for an input character, which means that they cannot occur within groupings. Typically, a repeat value is specified immediately following a group. When braces occur within brackets, they are simply interpreted as literals and so will be included in the list of acceptable input.
- The repeat value must be numeric. If the value contained within the braces cannot be translated into a number, the braces will again be interpreted as character literals.
- The repeat value is adjusted to include the token being repeated. That is, the parser decrements the value by one to account for the character position already defined. In the previous example with a repeat value of 4, the parser repeats the grouping three additional times (4 minus 1). This means that repeat values of one and zero have no effect. Neither suppresses the previous character pattern, and neither repeats that pattern again.
Validation Expressions (a.k.a. Regular Expressions)
So far, all of the functionality we've covered falls under the category of "iterative expressions", that is, expressions that test one-character of input at a time. For simple patterns, iterative expressions are all we need concern ourselves with. By the time the user finishes entering values into a social-security expression:
or a phone number pattern:
we are guaranteed to have data that is formatted as we require it, but for more complex types of data, the iterative expression falls short. Consider the following expression for a 24 hour clock:
This example pattern works well for most of the mask, accepting minutes in the range of 00 to 59 and eliminating the possibility of minutes input as 60 or above. The hour portion works reasonably well too. It allows us to enter 00 to 23 while excluding 30 or above. While this is close to what we want, it is not quite right since it also allows hours from 24 to 29 to be input. Alone, both 2 and 9 allowable. It is only when they are used together that one disallows the other. Clearly we need something more than iterative expressions to validate input.
As it happens, there is another type of notation which lends itself well to this sort of validation: regular expressions.
This is not a guide on regular expressions, there are many of those available on the internet. They are pertinent to this document, however, because they can be embedded within edit mask notation to serve as "validation expressions." To do this, we prefix a regular expression with a validation marker ('*') and enclose it in brackets as we would a grouping. The shortcomings of the previous time example can be solved by inserting the appropriate regular expression, enforcing a true 24 hour clock pattern:
With this revision, when a user attempts to enter an invalid hour such as 29, the edit mask compares the input against the validation expression. Upon failure the caret is moved back so that the user can try again.
Note that the regular expression tests only the input from the start of the validation bracket to the previous validation marker (or the start of input if one does not exist). To limit the range of characters, you may place a single validation marker (without brackets) at the start of the pattern to input. For example, the following will allow input of two letters followed by a two-digit number between 01 and 16 (e.g.: AK16):
The validation expression test only the input since the validation marker (underlined in this example), which excludes the first two letter characters.
Sample Useful Expressions
The following are a handful of sample patterns. It should be apparent by now that this list is by no mean exhaustive, but these are included here for your convenience:
24 hour clock:
[ 012]#[*[ 01]#|2[0-3]]:[0-5]#
12 hour clock:
[ 01]#[*[ 0]#|1[0-2]]:[0-5]# [AP]M
Date (between 1/1/1900 through 12/31/2199):
[ 01]#[*[ 0]#|1[0-2]]/*[ 0123]#[*[ 012]#|3]/*#[*19|2]##
A note on limitations in validation: this pattern enforces the date as much as possible in the current spec. Only values between 1 and 12 are accepted for month and only values between 1 and 31 are accepted for day; however, version 1.1 of the CodingMonk edit mask notation does not support context between separate validation expressions. So rules limiting the number of days to 30 for months 4,6, 9, and 1, as well as 28 or 29 days for month 2 depending on the year are not yet possible. Look to revision 1.2 to support this.
###[*100| ##| #]%
Temperature in Celsius or Fahrenheit (between 0.0 and 199.9 degrees):
[ 01][ #]#[*###| ##| #].# [CF]