Util Module

The Util module contains

  1. Some simple debug macros
  2. iToForeach, iToMap, iToFlatMap, iUntilForeach, etc functions. A more succinct and expressive alternative to the Standard Library'sRange Class.
  3. Many useful functions and extension methods.
  4. Powerful, fast, efficient Array based collections for primitive values and compound value classes. These work on both the Java platform, the JVM and in the web browser when compiled to JavaScript.
  5. Functional error system using the EMon trait and its Good and Bad sub classes.
  6. Parser for RSON, Rich Succinct, Object Notation. Includes a lexar for tokenisation and a parser for an AST, abstract syntax tree.
  7. RSON: Rich Succint Object Notation. A persistence system to write objects to text and to read text back into memory as objects, using consistent properly structured grammar heirachry, default values, Multiple values and repeat values.

Tokeniser

The Tokeniser will create the following tokens
  • Keytokens _ ? ?? ???
  • Identifiers alphanumeric tokens starting with a letter or underscore character.
  • Operators
  • Numeric literals
  • Separators , . .. ... {} etc.
  • String literals
  • Character literals
  • Comments

KeyTokens, Identifiers, and literals are all expressions. Operators, separators and comments are not. Identifiers includes lexemes such as if, IF true and TRUE. There are no alphabetic keywords in RSON syntax. Consumers of RSON syntax can of course treat what ever identifiers they want as keywords appropriate to their use case.

Identifiers are categorised into 3 types.
  • IdentUnder An identifer beginning with an underscore character.
  • IdentLow And identifer beginning with a lower case alphabetic character.
  • IdentUp An identifer beginning with an upper case alphabetic character. Some of these tokens will also be consider valid raw Base32 tokens, and a subset of these will also be considered valid raw Hexadecimals, however all the alphabetic characters must be lower case.
Numerical literals come in 4 types.
  1. Floating point numbers 6.02e23 6.02e-23 6.02e-23d.Note this is the only case where a negative or dash character is included as part of a token. Can have optional trailing lower case alphabetic characters at the end of the token.
  2. Explicit hexadecimals 0x433A 0x2222 0xff000bc Alphabetic characters must be all lower or all upper case.
  3. Explicit Base32 tokens 0y433G 0x222C 0yWW000MP Alphabetic characters must all be upper case.
  4. DigitCode tokens. These are a sequence of one or more sequences of digits separated by decimal points, as well as integer and fractional decimal numbers they can be used for version numbers, IP addresses and other codes. These can themsleves be further divided into
    • Valid natural integers 0 21 567 These are valid raw hexademimal and raw Base32 tokens.
    • Valid natural integers with trailing lower case alphabetic characters at the end of the token 0f 21snap 567d. These may be valid raw hexadecimal or Base32 tokens if the alphabetic letter all fall within the correct set of letters.
    • Raw Hexadecimals that start with a digit, the alphabetic characters must be all lower or all upper case. 3A 2d1e 567D. These are also valid raw Base32 tokens.
    • Raw Base32 tokens that start with a digit that are not valid raw Hexadecimals, the alphabetic characters must be all lower or all upper case. 3G 2d1s 567R
    • Valid fractional decimal numbers, which may have trailing lower case alphabetic characters at the end of the token 0.0f 2084.4 3.1rc
    • DigitCode Tokens with 3 or greater digit sequence parts 2.13.4 0.0.3snap 192.168.1.1
  5. There will be tokens for negative numbers. -5 -5.32 -5.87e7 -0xA434 will all be lexically processed as single tokens. This means that raw hexidecimals and raw base32s can be processed as 1 or 2 tokens depending on whether they start with a digit. This should not cause a problem as long as they are not combined with dot operators in dot expressions.
LetterChar = unicode_letter | '_'
NonZeroDigit = '1' ... '9'
DigitChar = '0' | NonZeroDigit
HexaLowerChar = 'a' ... 'f'
HexaUpperChar = 'A' ... 'F'
HexaLetterChar = HexaLowerChar | 'a' ... 'f'
HexaChar = DigitChar | HexLetterChar
LetterOrDigitChar = LetterChar | DigitChar
LetterOrUnderscoreChar = LetterChar | '_'
UnderscoreThenLetterOrDigit = '_', LetterOrDigitChar
Dot3Token = "..."
Dot2Token = ".."
DotToken = '.'

IdentifierToken = letter | UnderscoreThenLetterOrDigit, { LetterOrDigitChar | UnderscoreThenLetterOrDigit }
DeciLitToken = '0' | (NonZeroDigit { DigitChar })

Abstract Syntax Tree

So after the source has been tokenised it is parsed into an Abstract Syntax tree. the basic idea is that an RSON file can be three things.
  • An unStatemented file. It just contains an expression, without a semi colon at the end for example could just an Int or String.
  • A Statemented file
  • The empty file. It may contain comments but no expressions. The empty statement is a thing in itself but also a special case of a claused statement with with zero Clauses.
A statement can be 3 things
  • An unclaused Statement. It just contains an expression, without a comma at the end for example could just an Int or String.
  • A Claused Statement
  • The empty Statement It may contain comments but no expressions. The empty file is a thing in itself but also a special case of a Statemented file with zero statements.

So there is currently some confusion as to where it is parsed into a series of statements or into an expression. Currently Statements and Clauses contain an expression but are not themselves an expression.This is causing a block to me coding at the moment.

AST Precedence From lowest to highest after brace block parsing.
  • SemicolonToken Delimits the end of a Statement. The last Statement of a block / file may have, but does not need a need a trailing Semicolon. A statement without commas is considered Unclaused.
  • CommaToken Delimits the end of clause. The last Clause of a Statement may have, but does not need a trailing Comma, unless it is a single Clause Statment in which case it must have a trailing Comma to distinguish it from an Unclaused Statement.
  • assignment operators. == and != are normal operators. Any other operators ending with a '=' character are assignment operators.
  • The single Colon Token
  • From here on down the precedence is determined by the first character of the operator. An operator ending in a ':' will be expected to be an infix operator dispatch from its rhs.
  • ^
  • &
  • = !
  • :
  • + -
  • * / %
  • All other special characters
  • Whitespace

Hexadecimal and Base32

Hexadecimal is written with Uppercase letters. Base32 is written with the digits followed by the upper case letters A to W, with the letter 'O' unused

A 10, B 11, C 12, D 13, E 14, F 15, G 16, H 17, I 18, J 19, K 20, L 21, M 22, N 23, P 24, Q 25, R 26, S 27, T 28, U 29 V 30, W 31

Miscellaneous

A Lower case letter will be used after numerals in names.