Luasub |
Subclassing the Lua syntax |
Copyright © 2007-08 Asko Kauppi. All rights reserved.
This document was revised on 22-Jan-2008, and applies to version 0.41.
NOTE:
As of the time of writing, luaSub still
contains some bugs; code and
documents published for pre-information and
in order to get help in debugging. Thanks. :)
LuaSub allows "subclassing" Lua syntactically, generating subsets or extensions of the language itself, to be used on all sources or separately for each source file needing them.
luaSub is based on the work of token filters, but takes a radically different approach. It provides a second front-end to the Lua language, the 'luas' executable akin to the standard 'lua' front end. Lua libraries do not need to be patched. If wanted, the pure Lua output of certain program codes can be reduced (i.e. for tying luaSub together with 'luac' compiler, or LuaJIT).
luaSub is (to be) tested on the following systems:
There is nothing operating system specific in luaSub, it should be usable anywhere Lua itself is.
luaSub adds -s and -o flags to the regular Lua command line options.
The location of the syntax mods can be customized using the LUAS_PATH environment variable. It is treated similar to the normal LUA_PATH and LUA_CPATH variables (see Lua documentation).
Defaults paths are:
./?.luas;/usr/local/share/lua/5.1/?.luas (Linux, OS X) .\\?.luas;!\\lua\\?.luas (Win32)Usage from a source file
A common complaint of syntax mods has been that they make code reuse harder. LuaSub allows each individual source file to call in just the syntax mods it relies upon, and nothing else (unless given as default on the command line). This behaviour is merged into the 'shebang' line of the source file, which shall follow this syntax:
#![...optional path...]luas ... -s mod [-s mod2] ... #!...env luas ... -s mod [-s mod2] ...As long as 'luas' or 'env luas' is mentioned as the program name, luaSub will look for '-s' flags in the line. Such modules will be applied on the source.
Mods with parameters
Some mods, s.a. jacket can take a parameter to define the way the mod will behave. Jacket, for example, can be loaded either as jacket, jacket.ON (the default) or jacket.OFF, which eats out function parameter and return value constraints, without checking them at runtime. In other words, it put the mod into disabled mode.
Anything following a dot ('.') is given as a string parameter to the loaded mod. If the mod does not look for parameters, this information is simply ignored.
Baked in mod
If a source file requires a local mod, the mod itself can be carried alongside the source. Like this:
[optional shebang line] <<[<<<....<<<] function( syntax ) ... mod here end <<[<<<....<<<] ... actual Lua sourceSee tests/test-bakedin.lua for a sample.
Two '<<' are enough as the delimiters for start/end of the mod section, but more can be used for a visual effect.
Sample mods
There are some premade syntax mods within the luaSub source package. Even if one would not use them as such, they are valuable as templates for your own mods and samples of different techniques usable in making mods. Studying them is highly valuable.
The mods currently carried are:
continue Adds the 'continue' keyword to loops. doend do ... end -> function() ... end, and chaining functions without paranthesis drain ,,v= func(); getting rid of the '_' variable incdec +=, -=, *=, /=, ..= etc. index0 Making Lua tables index from [0..n-1] jacket Runtime type checks for function parameters and return values macro Defining and using macros nokeywords Allows use of 'x.function', 'x.nil' etc. as table keys select #... and ...[exp], instead of calling 'select()' tricond cond ? a : b, where 'a' may be false or nil using making functions of a table (s.a. 'math.*') locals
Writing mods
We'll walk through a single syntax mod, namely 'drain', to give a brief insight as to how the mods work, and how they do look. To have a wider understanding, please see all the bundled mods that come with luaSub (above). Their code contains more comments; here we're merely listing the source itself.
Initializer
Each syntax mod returns a function used for tying it onto the parser's syntax tree. It looks like this (comments removed):
return function( syntax )
local explist= ref"explist"
local var= ref"var"
syntax.chunk= { LOCAL_NIL, syntax.chunk }
syntax.varlist= { opt(var), optN(",",opt(var)), peek("="), CATCH }
local t= syntax.stat
assert( t[0] == "alt" )
local t_local= t["local"]
assert( t_local )
assert( t_local[1] )
assert( t_local[2] == nil )
t["local"][1]= { { optN(opt("
|
The syntax variable is a Lua table describing the syntax, and/or any mods being applied to it. The whole purpose of the initializer is to modify some of that table, binding its own mods into it at appropriate places.
To see the whole default syntax, see syntax.lua, which is compiled into luaSub and holds the default Lua syntax tree.
ref"explist" makes a reference to .explist branch of the syntax. There is a BIG difference between using ref and just reading syntax.explist (which is also there); syntax.explist takes the current syntax of that branch and ignores mods that would be loaded after the current one. ref"explist" is tied only after all the mods have been loaded, and using it thus means "anything 'explist' would mean at the parsing time. If your mod is the only one loaded, they are the same. If there are many mods, using the right one is crucial!
With the definition of syntax.chunk, we are not using ref but the current value of syntax.chunk. We essentially fill in our own function (LOCAL_NIL) to the syntax tree, and let it be followed by whatever syntax.chunk has when we see it.
It is important to make the least amount of assumptions of the existing syntax tree. It might already be modded several times by the time it gets to your mod; do NOT rely on it to be in its default shape. If you do, use asserts to document & make sure all your assertions on its shape are valid. This will break your mod at loading time if it were to conflict with some other mod, but that is really a Good Thing.
Definition of varlist does not refer or use the old varlist definition at all; this is intentional, and essentially nullifies any changes other mods would have done to it. Normally, avoid this rudeness in your mods.
The globals provided by luaSub to define your syntax are the same as used in syntax.lua for describing the default syntax.
{ ... } list of entities, all needing to match in order (or the whole list fails)
alt{ [key]= ..., ... }
If some string index matches, the key and its corresponding
value are the matching entity (and scope). If matching them
fails, the whole 'alt' fails.
If no string index matches, [1..N] are tried, in order
alt1toN_but_last{ ... }
Special version of 'alt' used for parsing the problematic
prefixexp entity; should not be needed by custom mods.
opt( ... ) like {...} but optional (if nothing matches, parsing the
above entity will still proceed)
optN( ... ) like 'opt' but with 0..N repetition
peek( ... ) matches if the upcoming tokens (not read yet) match these
in order
ref( str ) refers to the final definition of named syntax entity
(this is source file specific, as all luaSub syntaxes are)
The LOCAL_NIL and CATCH values are modifier functions (to be studied in a minute). They are called during parsing, with the entity containing them as the 'scope' where they can affect tokens as [1..N] and/or [-N..-1], inserting, replacing, copying and/or moving at will.
LOCAL_NIL adds a statement right at the beginning of our chunk, whereas CATCH handles the actual mod; adding references to __nil__ where no target variable was especially stated (which indeed is the meaning of the whole mod).
CATCH needs to be inserted in two places in the syntax tree; otherwise it'd apply to local declarations only. The rest of the initializer checks that syntax.stat.local is how we expect it, and applies the change.
NB. assert( t[0]=='alt' ) checks that t is an 'alt' entity; all luaSub entities are tables with [0] carrying their type (for the list, [0]==nil).
We tied LOCAL_NIL to be called at the beginning of each chunk.
local __NIL__= "__nil__"
local function LOCAL_NIL( p )
p:replace( 1,0, "local", __NIL__, ";" )
end
|
The p parameter is the parser object, with following methods:
[str [,val]]= p:tokenat( idx_int )
Returns token at position idx (1..N / -1..-N) or nil for other indices.
uint= p:n()Returns the number of tokens in current scope (N)
void= p:replace( idx_int, eat_uint [, ..._str/num ] )Inserts given tokens (the string params) before/in place of idx position in token flow. Replaces eat number of existing tokens.
void= p:copy( idx_int, n_uint )Copy n tokens from idx (1..N / -N..-1) to the end of current scope.
void= p:move( idx_int, n_uint )Move n tokens from idx (1..N / -N..-1) to the end of current scope.
The second mod function does the actual token manipulation:
local function CATCH( p )
if p:n() == 0 then -- (nothing) =
p:replace( 1,0, __NIL__ )
else
local last_was_comma= true -- preceding ',' shall add a name
local i=1
while i <= p:n() do
local c= p:tokenat(i)
if c=="=" then
break
elseif c == "," then
if last_was_comma then -- ',' without preceding name -> fill in
p:replace( i,0, __NIL__ )
i= i+1
end
last_was_comma= true
else
last_was_comma= false
end
i= i+1
end
if last_was_comma then
p:replace( i+1,0, __NIL__ )
end
end
end
|
Note that luaSub modifies entities within the scopes, innermost scope first. This means that outer mod functions may get token streams that are already modified by either itself, or some other mod. This is good, since it allows i.e. to deal with a certain function body (but not its local functions) in the right way.
One Rule of Being a Good Mod is that token streams should be modified at the end of a scope, not in the middle. In other words, when a mod is applied, it must be definite that this is really the syntax branch that will be taken. This is on the responsibility of the mod writer.
The backbone of luaSub is GPL 2 licensed. Syntax mods are MIT/X11 licensed.
Please see LICENSE for details.
====================== Token Filter patch (by LHF) ====================== Token Filters: A Macro Facility for Lua (PDF) http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/#tokenf The father of all token filtering, this patch was given out as a platform for experiments. In fact, also luaSub started its life as token filters using this patch. Pros: - simple - from (one of the) Lua authors - filters doable in either Lua or C - uses the Lua parser itself (100% compatible) Cons: - needs patching Lua core or luac - slow(ish) - not syntax aware - co-operation of token filters is not necessarily easy Because of not being syntax aware, Token Filter patch is suitable to simple replacement kind of token processing. ============= LuaMacro (by Steve Donovan) ============= http://lua-users.org/wiki/LuaMacros Pros: - higher abstraction than Token Filter patch itself - syntax aware (it seems?) Cons: - needs Token Filter patch - macros defined as tokens, not as Lua code - macro definitions need to be in a separate file - same set of macros for all source - some limitations ("need to choose a name, ... macros are not triggered on arbitrary tokens") ======== Luma (by Fabio Mascarenhas) ======== http://luaforge.net/projects/luma/ http://luma.luaforge.net/ Pros: - simple, modular, well defined - excellent web site & documentation - macros are clearly visible from regular Lua code Cons: - macros are separate from regular Lua code - macro parameters are a string (no syntax highlighting) - what's the added value to just using functions? Luma serves an area of its own, where macros are not so much meant as Lua syntax mods (like luaSub and MetaLua would have them) but as additional tools that are intended to look different. Then again, I wonder why not just use Lua functions? I still don't see much added value in using macros that look like this: try [[ error("error!") assert(false) catch err assert(err) ran_catch = true finally ran_finally = true ]] One major issue with placing all macro contents within the [[..]] string, is that syntax highlighting in an editor no longer works. Neither would line numbers in error messages, I bet. The author of MetaLua gives good words to Luma, but I don't really see the light in its approach. =========== MetaLua (by Fabien) =========== http://metalua.luaforge.net/ http://metalua.luaforge.net/metalua-manual.html http://metalua.luaforge.net/quicktour.html "Metalua is an extension of Lua, which essentially addresses the lack of a macro system, by providing compile-time metaprogramming (CTMP) and the ability for user to extend the syntax from within Lua." MetaLua is a complex project, and "a language" in itself as its website states. Pros: - no patching involved - "Full compatibility with Lua 5.1 sources and bytecode: clean, elegant semantics and syntax, amazing expressive power, good performances, near-universal portability." - documentation and web site seemingly excellent Cons: - complex - keeps the whole syntax tree in memory (a whole chunk) - non-Lua looking syntax: "all meta-operations happen between +{...} and -{...}, and visually stick out of the regular code." MetaLua is 100% Lua, which is both a strength and a weakness, depending on how one wants to see it. Parsing performance would most likely be less than that of luaSub, but this has not been measured (comparisons welcome!). luaSub is 5000 lines C, 600 lines Lua. MetaLua is 5288 lines Lua. To me, it seems both MetaLua and luaSub are out to solve the same issues. Maybe MetaLua is a hinch father away from Lua, into a "language on its own" area, whereas luaSub tries, per definition and name, to be just a twist of Lua itself. A comparison from someone willing to take these both on a test run would be interesting. |
ftools/fsyntax is mentioned on Lua list archives regarding token filtering. They were Lua scripts adding syntax awareness to LHF's token filter patch. Essentially, they can be regarded as luaSub 0.1.
For feedback, questions and suggestions: