Primitive types

A primitive type, such as int32 or rstring, is one that is not composed of other types.

By supporting many primitive types, SPL gives you fine control over data representation, which is crucial for performance in high-volume streams. Tight representation is important both to keep the data on the wire small, and to reduce serialization and deserialization time. SPL supports the following primitive types:

Table 1. Primitive types in SPL

The type and representation of primitive types in SPL. The types int, uint, float, decimal, complex, and xml have three or more table rows.

Type Representation
boolean true or false
enum User-defined enumeration of identifiers
intb Signed b-bit integer. The signed integer types can be:
int8 -128 to 127
int16 -32,768 to 32,767
int32 -2,147,483,648 to 2,147,483,647
int64 -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
uintb Unsigned b-bit integer. The unsigned integer types can be:
uint8 0 to 255
uint16 0 to 65,535
uint32 0 to 4,294,967,295
uint64 0 to 18,446,744,073,709,551,615
floatb IEEE 754 binary b-bit floating point number. The binary floating types can be:
float32 Single-precision (equivalent to float in Java):

±, significand 24 binary digits, exponent 2-126 to 2127

float64 Double-precision (equivalent to double in Java ):

±, significand 53 binary digits, exponent 2-1022 to 21023

decimalb IEEE 754 decimal b-bit floating point number. The decimal floating types can be:
decimal32 ±, significand 7 decimal digits (0 to 9.999999), exponent 10-95 to 1096
decimal64 ±, significand 16 decimal digits, exponent 10-383 to 10384
decimal128 ±, significand 34 decimal digits, exponent 10-6,143 to 106,144
complexb 2b-bit complex number. The complex types can be:
complex32 Both real and imaginary parts are float32
complex64 Both real and imaginary parts are float64
timestamp Point in time, with nanosecond precision
rstring Sequence of raw bytes that supports string processing when the character encoding is known.
Note: SPL functions that manipulate an rstring need to process the character encoding of the string. In general, an SPL function can handle single-byte character encoding. However, the string might be encoded as multi-byte, such as UTF-8, so you need to process accordingly in your application.
ustring String of UTF-16 Unicode characters, which are based on ICU library
blob Sequence of raw bytes
rstring[n] Bounded-length sequence of, at most, n raw bytes that support string processing when the character encoding is known
xml xml Holds XML values
xml<"schemaURI"> Holds XML values that match the schemaURI
Figure 1. Hierarchy of SPL types

This figure displays the hierarchy of SPL types.

The names of numeric types include their bit-width to make the naming consistent and to avoid unwieldy names such as “long long unsigned int”. Users can also define their own type names. See Type definitions.

An example for an enumeration is:
type tracing = enum { off, error, warn, info, debug, trace };                  //1   

Any of the identifiers off, ....., trace can be used where a value of enumeration tracing is expected. The scope of the identifiers off, ....., trace is the same as the scope that contains the type definition. Enumerations are ordered (they permit comparison with <, >, <=, and >=) but not numeric (they do not permit arithmetic with +, -, *, and so on).

Like in C/Java, literals for int, uint, float, and decimal can have optional type suffixes. For example, 123 is signed (int32) whereas 123u is unsigned (uint32). One suffix indicates the kind of number.

Suffix Meaning
s Signed integer (default for integer literals)
u Unsigned integer
f Binary floating-point (default for floating point literals)
d Decimal floating-point

Another suffix indicates the number of bits.

Suffix Meaning
b (byte) 8-bit
h (halfword) 16-bit
w (word) 32-bit (default for integer literals)
l (long) 64-bit (default for floating point literals)
q (quad-word) 128-bit

Some more examples for literals with type suffix: 0.0005 (float64), 0.5e-3 (float64), 3.5d (decimal64), 3.5w (float32), 123d (decimal64), 123dq (decimal128).

SPL supports hexadecimal literals. One can specify a hexadecimal literal with a 0x prefix. Valid suffixes for hexadecimal literals are s (signed integer) and u (unsigned integer). By default a hexadecimal literal is a signed integer. Its data length is determined by the number of hexadecimal digits specified. The data length includes any leading zeros. A maximum of 16 hexadecimal digits are supported (int64, uint64). Specifying more than 16 hexadecimal digits results in an error.

Some examples of hexadecimal literals: 0xf (int8), 0x00fu (uint16), 0x12345 (int32), -0x12345s (int32), 0x0123456789ABCDEF (int64)

String literals are written in single quotation marks or double quotation marks. SPL supports two string types, "Unicode" and "raw". ustring contains Unicode characters that are encoded as UTF-16, and rstring contains raw bytes. This behavior allows the developer to pick Unicode when international character sets are important, and to pick raw strings when constant-time random access and a tight representation are important. A type suffix in the literals indicates the string kind: r indicates rstring (the default without suffix) and u indicates ustring.

String literals can use escape sequences of the form \uhhhh, where the four hexadecimal digits hhhh specify a character. For example, "pi\u00f1ata"u uses the escape \u00f1 to specify a ñ with a tilde on top in a ustring or rstring. Other string escape sequences are similar to those sequences in C or Java.
Table 2. String escape character
String escape character Meaning
\a Alert
\b Backspace
\f Form feed
\n Newline
\r Carriage return
\t Horizontal tab
\v Vertical tab
\' Single quotation mark
\" Double quotation mark
\? Question mark
\0 Null character
\\ Literal backslash
Note: String literals that are written in single quotation marks can contain double quotation marks without needing the double quotation mark string escape character (for example, '"').

Recall from topic Lexical syntax that SPL files are written in UTF-8, so letters such as ñ can also appear directly in a string literal, without the escape sequence. Both ustring and rstring can contain internal null characters, which, unlike in C, are not considered terminating. In other words, characters whose encoding is zero carry no special meaning, and the length of a string is independent from whether or not it contains such characters.

String literals can also contain newlines. The newline character is part of the string. For example:
rstring myString=
"A long
 string with a newline in it.";
This example is equivalent to:
rstring myString="A long\n string with a newline in it.";

A literal for a complex number is written as a list literal with a cast, as in (complex32)[1.0, 2.0]. The real and imaginary components of a complex number can be extracted by using the SPL built-in functions real() and imag().

The timestamp type is designed to allow a high degree of precision as well as avoid overflow conditions (it can represent values that range over billions of years) by following widely accepted standards. It uses a 128-bit representation, where 64 bits store the seconds since the epoch as a signed integer, 32 bits store the nanoseconds, and 32 bits store an optional identifier of the computer where the measurement was taken, which can be useful for after-the-fact drift compensation. The epoch, time zone, and so on, depend on the library functions used to manipulate time stamps; for more information about these functions, see the API documentation. A timestamp can be initialized by using one of the SPL functions or from a float64. There is no literal for a timestamp.

Many operators and functions are overloaded to work with different types. For example, the operator + can add various types of numbers, but it can also concatenate strings. Likewise, the function length(x) is overloaded to accept x of type rstring or ustring.

To permit efficient marshalling and unmarshalling of network packets, SPL offers a bounded-size variant of rstring, list, set, and map types. For example, rstring[5] can store any rstring of at most 5 characters, and each character in an rstring takes 1 byte. If all parts of a data value have a fixed size, then parts can be found at a fixed offset without decoding the whole. The compiler prohibits implicit conversions from unbounded to bounded types, but the user can override that by explicit casts. Type bounds, whether in variable declarations or in casts, must be compile-time constants. A cast from any string to a bounded string truncates the value if it is too long. SPL does not offer bounded ustring values because bounding the number of Unicode characters would not achieve the goal of fixing the size of the network byte representation after conversion. SPL limits all strings, bounded or unbounded, raw or Unicode, to at most 231-1 characters.

Blobs are sequences of at most 263-1 raw bytes. A blob can be initialized from a list<uint8>. There is no literal for a blob.

The xml type holds XML values. It is not comparable. There is no bounded version of the xml type, so an xml attribute cannot be part of a façade tuple. The xml type can be only assigned by using = and passed as a parameter or returned from a function. SPL supports two variations of the xml type:
  • xml: Holds well-formed XML documents. A runtime check is done on assignment or conversion to ensure that the values are well-formed XML. A C++ exception is thrown and the operator is terminated if the value that is being assigned is not well-formed. The convertToXML standard library function can be used to check whether a conversion causes an exception.
  • xml<"schemaURI">: Holds XML values that match the schemaURI. It is checked at run time, unless the value is known to be valid already. A C++ exception is thrown and the operator is terminated if the value that is being assigned is not valid for the schema. The Teracloud® Streams instance can optimize the checking if the right side is known to be well-formed (that is, it is from an xml type with the same schema). The schemaURI is fetched on demand. Using a networked schemaURI might cause SPL program failures if the schema on the network is changed without changing the SPL program, a networked connection is not available, or the site that is referenced by the URI is not available. Maintain the schema locally as part of the application. Application directory relocation needs to be addressed as well. The schema should be relative to the data directory, or available at the same absolute path from all computers.
XML literals have the form "wellFormedXML"x or 'wellFormedXML'x. Examples of valid XML literals include:
"<?xml version=\"1.0\"?><x a=\"b\">55</x>"x
'<x a="b">55</x>'x
XML literals are checked by the SPL compiler to see whether they are well-formed. The compiler checks for validity if the left side of an assignment or formal parameter has an xml<schemaURI> form. A warning is generated if the XML literal is not well-formed, or if it is not valid for assignment or passing to an xml<schemaURI> type.
Note: The language specification purposely does not specify a byte order because users are oblivious to these details within an SPL application. Compilers are expected to provide flags to choose, for example, network byte order or native byte order. The internal representation of both bounded and unbounded strings and blobs stores a separate length field. As with all types, the exact layout is implementation-dependent and not exposed at the language level. Typically, bounded types are padded in case the length is lower than the bound, thus allowing subsequent attributes to be stored at a fixed offset in a network packet. The implementation might also reduce the number of bits in the length field according to the bound to save space. The maximum tuple size when a tuple's contents is serialized for network transmission is 232-2 bytes. At the C++ level, a SPLSerializationException is thrown if the maximum tuple size limit is exceeded. A developer might elect to catch and handle this exception when primitive operators are implemented, avoiding sudden termination of the operator at run time.
Tip: Use the decimal floating point types in financial, commercial, and user-centric programs to avoid losing decimal digits to binary rounding. For unstructured data of bounded size, use a list<uint8>[n] instead of a blob. For more information, see Composite types.