Lexer

The Lexer analyzes the characters of a source text and produces an array of tokens.

class Lexer {}

Constructors

this
this(SourceText srcText, LexerTables tables, Diagnostics diag = null)

Constructs a Lexer object.

Destructor

A destructor is present on this object, but not explicitly documented in the source.

Members

Functions

decodeUTF8
dchar decodeUTF8(ref cchar* ref_p)

Decodes the next UTF-8 sequence.

dlxCallback
bool dlxCallback(Token* t)

Callback function to TokenSerializer.deserialize().

endX
cchar* endX()

Returns the end pointer excluding the sentinel string.

error
void error(cchar* columnPos, MID mid, ...)
void error(LineLoc line, cchar* columnPos, MID mid, ...)

Forwards error parameters.

error
void error(TypeInfo[] _arguments, va_list _argptr, LineLoc line, cchar* columnPos, cstring msg)

Creates an error report and appends it to a list.

errorFilePath
cstring errorFilePath()

Returns the file path for error messages.

errorLineNumber
size_t errorLineNumber(size_t lineNum)

Returns the error line number.

finalizeFloat
void finalizeFloat(Token* t, cstring float_string)

Sets the value of the token.

finalizeSpecialToken
void finalizeSpecialToken(Token* t)

Sets the value of the special token.

firstToken
Token* firstToken()

Returns the first token of the source text. This can be the EOF token. Structure: [NullToken, HEAD, Newline, FirstToken, ..., NullToken]

fromDLXFile
bool fromDLXFile(ubyte[] data)

Loads the tokens from a dlx file.

getBuffer
CharArray getBuffer()

Acquires the current buffer.

head
Token* head()

Returns the HEAD token.

isInText
bool isInText(cchar* p)

Returns true if p points inside the source text.

isNewlineEnd
bool isNewlineEnd(cchar* p)

Returns true if p points to the last character of a Newline.

lastToken
Token* lastToken()

Returns the EOF token.

lineNum
size_t lineNum()

Returns the current line number.

lookupFloat
Float lookupFloat(cstring str)

Looks up a Float in the table.

lookupNewline
NewlineValue* lookupNewline()

Looks up a newline value.

lookupString
StringValue* lookupString(cstring str, char postfix)

Looks up a StringValue. Copies str if it's not a slice from the src text.

lookupString
cbinstr lookupString(cbinstr bstr)

Forwards to tables.lookupString().

newToken
Token* newToken()

Returns the next free token from the array. NB: The bytes are not zeroed out.

new_
T* new_()

Allocates memory for T.

peek
void peek(ref Token* t)

Advance t one token forward.

scan
void scan(Token* t)

The main method which recognizes the characters that make up a token.

scanAll
void scanAll()

Scans the whole source text until EOF is encountered.

scanBlockComment
void scanBlockComment(Token* t)

Scans a block comment.

scanCharacter
void scanCharacter(Token* t)

Scans a character literal.

scanDelimitedString
void scanDelimitedString(Token* t)

Scans a delimited string literal.

scanEscapeSequence
dchar scanEscapeSequence(ref cchar* ref_p, out bool isBinary)

Scans an escape sequence.

scanEscapeString
void scanEscapeString(Token* t)

Scans an escape string literal.

scanFloat
void scanFloat(Token* t)

Scans a floating point number literal.

scanHexFloat
void scanHexFloat(Token* t)

Scans a hexadecimal floating point number literal. $(BNF /HexFloat := "0" [xX] (HexDigits? "." HexDigits | HexDigits) HexExponent /HexExponent := [pP] [+-]? DecDigits /HexDigits := [a-fA-F\d] [a-fA-F\d_]* /)

scanHexString
void scanHexString(Token* t)

Scans a hexadecimal string literal.

scanNestedComment
void scanNestedComment(Token* t)

Scans a nested comment.

scanNormalString
void scanNormalString(Token* t)

Scans a normal string literal.

scanNumber
void scanNumber(Token* t)

Scans a number literal.

scanRawString
void scanRawString(Token* t)

Scans a raw string literal.

scanShebang
void scanShebang()

The "shebang" may optionally appear once at the beginning of a file. $(BNF Shebang := "#!" AnyChar* EndOfLine)

scanSpecialTokenSequence
void scanSpecialTokenSequence(Token* t)

Scans a special token sequence.

scanTokenString
void scanTokenString(Token* t)

Scans a token string literal.

scan_
void scan_(Token* t)

An alternative scan method. Profiling shows it's a bit slower.

setBuffer
void setBuffer(CharArray buffer)

Takes over buffer if its capacity is greater than the current one.

text
cstring text()

Returns the source text string.

tokenList
Token[] tokenList()

Returns the list of tokens excluding special beginning and end tokens.

Static functions

cases
char[] cases(string[] strs...)

Generates case statements for token strings.

copySansUnderscores
char[] copySansUnderscores(cchar* begin, cchar* end)

Returns a zero-terminated copy of the string where all underscores are removed.

encodeUTF8
void encodeUTF8(ref CharArray str, dchar d)

Encodes the character d and appends it to str.

findInvalidUTF8Sequence
cstring findInvalidUTF8Sequence(cbinstr bstr)

Searches for an invalid UTF-8 sequence in str.

formatBytes
cstring formatBytes(cchar* start, cchar* end)

Formats the bytes between start and end (excluding end.)

scanPostfix
char scanPostfix(ref cchar* p)

Scans the postfix character of a string literal.

scanUnicodeAlpha
bool scanUnicodeAlpha(ref cchar* ref_p)

Returns true if the current character to be decoded is a Unicode alpha character.

Static variables

chars_line
uint chars_line;

line as a uint.

chars_q
ushort chars_q;

q" as a ushort.

chars_q2
ushort chars_q2;

q{ as a ushort.

chars_r
ushort chars_r;

r" as a ushort.

chars_shebang
ushort chars_shebang;

#! as a ushort.

chars_x
ushort chars_x;

x" as a ushort.

Structs

LineLoc
struct LineLoc

Groups line information.

Variables

allocator
ChunkAllocator allocator;

Allocates memory for non-token structs.

buffer
CharArray buffer;

A buffer for string values.

diag
Diagnostics diag;

For diagnostics.

end
cchar* end;

Points one character past the end of the source text.

errors
LexerError[] errors;

List of errors.

hlinfo
Token.HashLineInfo* hlinfo;

Holds the original file path and the modified one (by #line.)/// Info set by "#line".

inTokenString
uint inTokenString;

> 0 if inside q{ }

lineLoc
LineLoc lineLoc;

Current line.

p
cchar* p;

Points to the current character in the source text.

srcText
SourceText srcText;

The source text.

tables
LexerTables tables;

Used to look up token values.

tokens
TokenArray tokens;

Array of Tokens.

Meta