All Haxe targets except Neko support Unicode in strings by default. The compile-time define
target.unicode is set on targets where Unicode is supported.
A string in Haxe code represents a valid sequence of Unicode codepoints. Due to differing internal representations of strings across targets, only the basic multilingual plane (BMP) is supported consistently: every BMP Unicode codepoint corresponds to exactly one string character.
It is still possible to work with strings including non-BMP characters on all targets without having to manually decode surrogate pairs by using the Unicode iterators API provided in the standard library.
On some targets, the internal representation is UTF-16, which means that non-BMP Unicode codepoints are represented using surrogate pairs. The compile-time define
target.utf16 is set when the target uses UTF-16 internally.
Some Haxe targets disallow null-bytes (Unicode codepoint 0) in strings. Additionally, some Haxe core APIs assume a null-byte terminates strings. To consistently deal with binary data, including null-bytes, use the
|Target||Internal encoding||Null-byte allowed|
|C++||yes||yes||ASCII or UTF-16 (if needed)||yes|
|Python||yes||no||Latin-1, UCS-2, or UCS-4 (see PEP 393)||yes|