Using Regular Expressions

You are viewing an old version of this entry, click here to see latest version.

Haxe has builtin support for Regular Expressions. They can be used to verify the format of a string or extract some regular data from a given text. A regular expression starts with ~/ and ends with a single / :

    var r : EReg = ~/world/;
    var str = "hello world";
    trace(r.match(str)); // true : 'world' was found in the string
    trace(r.match("hello !")); // false

You can use standard Regular Expressions patterns such as (not exclusively) :

  • . : any character
  • * : repeat zero-or-more
  • + : repeat one-or-more
  • ? : optional zero-or-one
  • [A-Z0-9] : character ranges
  • [^\r\n\t] : character not-in-range
  • (...) : parenthesis to match groups of characters
  • ^ : beginning of string/line
  • $ : end of string/line
  • | : "OR" statement.

For example, the following regular expression match a valid email address :

    ~/[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z][A-Z][A-Z]?/i;

Please notice that the i at the end of the regular expression is a Flag that enable case-insensitive matching.

The possible flags are the following :

  • i : case insensitive matching
  • g : global replace or split, see below
  • m : multiline matching, ^ and $ represent only the beginning and end of the string
  • s : the dot . will match also newlines (Haxe/Neko only)
  • u : use utf8 matching (Haxe/Neko only)

Groups

You can extract some informations by using groups :

   var str = "Nicolas is 26 years old";
   var r = ~/([A-Za-z]+) is ([0-9]+) years old/;
   r.match(str);
   trace(r.matched(1)); // "Nicolas"
   trace(r.matched(2)); // "26"

The r.matched(0) result will always return the whole matched substring, and r.matchedPos() will return the position of this substring in the original string :

   var str = "abcdeeeeefghi";
   var r = ~/e*/;
   r.match(str);
   trace(r.matched(0)); // "eeeee"
   trace(r.matchedPos()); // { pos : 4, len : 5 }

Replace

A regular expression can also be used to replace a part of the string :

   var str = "aaabcbcbcbz";
   var r = ~/b[^c]/g; // g : replace all instances
   trace(r.replace(str,"xx")); // "aaabcbcbcxx"

You can use $X to reuse a matched group in the replacement :

   var str = "{hello} {0} {again}";
   var r = ~/{([a-z]+)}/g;
   trace(r.replace(str,"*$1*")); // "*hello* {0} *again*"

Split

A regular expression can also be used to split a string into several substrings. In that case, the delimiter used to split is not a constant string but a regular expression :

  var str = "XaaaYababZbbbW";
  var r = ~/[ab]+/g;
  trace(r.split(str)); // ["X","Y","Z","W"]

Implementation Details

Regular Expressions are implemented :

  • in Javascript, the Browser is providing the implementation with the object RegExp.
  • in Neko, the PCRE library is used
  • in Flash9, the native implementation is used
  • FIXME in Flash 6/8, the implementation is not yet available but will a pure Haxe version (hence very slow since it's not native, but compatible)

version #10828, modified 2011-08-11 10:17:19 by dmpost