Beginner's guide to regular expressions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Beginner's guide to regular expressions

    Regular expressions are an extremely flexible and advanced way to find precise matches and patterns in strings of text with fine-grain control and accuracy. It's very simple to learn regular expressions, and you'll be surprised at just how easy it is using regular expressions to find precise patterns (like the structure of a valid e-mail address, IP address or URL).

    Instead of talking a big deal about it, let's show you an example of a regular expression which validates that a name is at least three characters long:

    ^[a-zA-Z]{3}$

    So what do all of these characters mean? The ^ caret character means "start of string". So at the start of the string, a range of characters are to be expected (any a-z or A-Z characters - i.e. upper-case and/or lower-case, or a mix), and precisely three characters in length. The $ sign means "end of string". So because we use both the caret and dollar metacharacters here, in the entire string, three alphabetic characters are expected for the regular expression to find a match.

    So let's try some others. What if you wanted to verify that someone has entered in a correct telephone number? Well suffice to say regular expressions cannot check whether the phone number is technically correct, but it can verify that the length and use of characters for the field is valid. In the case of a UK landline number, that'd be 11 characters in length - all numbers, of course.

    ^[0-9]{11}$

    Fairly simple. Again, the caret (^) means "start of string"; the brackets are for a range of characters - in this case, 0 to 9; then the min-max quantifier (i.e. the expected string length - which is 11 characters in this case). Finally, the $ metacharacter means "end of string", as we've said before. Just to give you a clue, if you did not include the dollar metacharacter at the end, users could add their telephone number and random characters afterwards and it would still be considered valid by the regular expression. In this case, your expression in plain English would be - “at the start of the string, there must be numeric characters (between 0 and 9), and at least 11 characters in length”. What about after that? That wouldn't matter to the interpreter, because it has found the match you were asking for.

    Some others:

    Matching a username
    ^[a-zA-Z0-9_-]{3,14}$
    A username with the following permitted characters with a length restriction of between 3 and 14: a-z or A-Z (or a mix), 0-9, _ and -

    This expression uses metacharacters like:
    • ^ - start
    • [] - character class (i.e. a-z, A-Z, etc.)
    • {} - numeric quantifier
    • $ - end


    Matching a date of birth (DD/MM/YYYY)
    ^([0-2]{1}[1-9]{1}|[1-2]{1}[1-9]{1}|3{1}[0-1]{1})\/(0{1}[1-9]{1}|1{1}[0-2]{1})\/(1|2){1}[0-9]{1}[0-9]{1}[0-9]{1}$

    This is a lengthy expression that uses some metacharacters like:
    • ^ - start
    • | - OR (alternator)
    • \ - metacharacter escape (processes literal character) - in the case above, the forward slash character (/)
    • [] - character class (e.g. a-z, A-Z, etc.)
    • {} - numeric quantifier
    • () - group for defining the scope of the alternator
    • $ - end

    The expression is long for a good reason - it will not match on common invalid dates, like:
    • 32/01/2013
    • 31/31/2013
    • 31/12/13

    A list of important metacharacters...
    • ^ - start of line
    • $ - end of line
    • \ - if you need to express the literal character of a metacharacter, add a backslash before it - for example, \$
    • {} - numeric qualifier, e.g. {1,3} means “between 1 and 3 characters”
    • () - group, in the cases above used for defining the scope of the alternator metacharacter
    • | - alternator, or the “OR” clause. e.g. “this string must either be php|perl” - so only “php” or “perl” will return a match
    • ? - matches the preceding 0 or 1 time
    • * - matches 0 or more times
    • + - matches 1 or more times
    • {x,} - matches x times or more (x being a numeric value)


    Learn more:
    How do I use regular expressions in PHP?
Working...
X