10.1.2 Unicode

since Haxe 4.0.0

All Haxe targets except Neko support Unicode in strings by default. The compile-time define target.unicode is set on targets where Unicode is supported.

A string in Haxe code represents a valid sequence of Unicode codepoints. Due to differing internal representations of strings across targets, only the basic multilingual plane (BMP) is supported consistently: every BMP Unicode codepoint corresponds to exactly one string character.

It is still possible to work with strings including non-BMP characters on all targets without having to manually decode surrogate pairs by using the Unicode iterators API provided in the standard library.

<--label:std-String-encoding-->

Encoding

On some targets, the internal representation is UTF-16, which means that non-BMP Unicode codepoints are represented using surrogate pairs. The compile-time define target.utf16 is set when the target uses UTF-16 internally.

Null-bytes in strings

Some Haxe targets disallow null-bytes (Unicode codepoint 0) in strings. Additionally, some Haxe core APIs assume a null-byte terminates strings. To consistently deal with binary data, including null-bytes, use the haxe.io.Bytes API.

Target details
Targettarget.unicodetarget.utf16Internal encodingNull-byte allowed
FlashyesyesUTF-16no
JavaScriptyesyesUTF-16yes (except in some old browsers)
ActionScript 3yesyesUTF-16no
C++yesyesASCII or UTF-16 (if needed)yes
JavayesyesUTF-16yes
JVMyesyesUTF-16yes
C#yesyesUTF-16yes
PythonyesnoLatin-1, UCS-2, or UCS-4 (see PEP 393)yes
LuayesnoUTF-8yes
PHPyesnobinaryyes
EvalyesnoUTF-8yes
Nekononobinaryyes
HashLinkyesyesUTF-16no