Blocks And Indentation
Languages like Python (context
free movie) are indentation based instead of using tokens such
as { }
, BEGIN END
or similar for blocks.
Developer opinion about assigning semantic meaning to white space is diverse. Many love Python's choice but you can also find negative articles about it like in the CACM Kode Vicious column by George V. Neville-Neil (Sanity vs. Invisible Markings—Tabs vs. spaces that polemically suggests using the pile of poo unicode character instead.
Block Structure using Tokens
Any reasonable style guidelines require blocks to be indented properly, even
if this is redundant when begin and end tokens such as { }
are
used. Hence, for any multi-line blocks, using tokens in a properly formatted
source code is redundant and a waste of space.
However, things become tricky when
- blocks are optional
- short blocks should be in a single line
- there are semantics for an empty block, as in Fuzion, where
f { }
declares a feature, whilef
calls a feature.
One solution might be to add additional keywords to avoid ambiguity, e.g.,
defining a routine with f do { }
would make the braces
obsolete.
Block Structure in other languages
C/C++/Java/C# family: { }
Control statements such as if
and while
have a single statement as their
condition code or loop code:
if (condition) statement; else statement; while (condition) statement;
A clear separation between the condition and the statement is achieved by putting the condition in parentheses.
A block is a statement that consists of braces with a semicolon separated sequence of statements:
{ statement1; statement2; statement3; }
Consequently, braces can be used in control statement as follows
if (condition) { statement1; statement2; statement3; } else { statement4; statement5; statement6; } while (condition) { statementA; statementB; statementC; }
Some cases require a block, e.g., a function definition
void f(String s) { print(s); } // illegal: void f(String s) print(s);
Pascal/Modula/Oberon/Eiffel family: begin / end
Control statements start a sequence of statements with a keyword and end it with a keyword. Statements are separated by semicolons:
IF condition THEN statement1; statement2; statement3 ELSE statement4; statement5; statement6 END WHILE condition DO statementA; statementB; statementC END
Declarations of procedures use BEGIN / END:
PROCEDURE f(VAR x: STRING); BEGIN print(x) END f;
Python/Nim family: colon and indentation
Within a control statement, statement sequences are introduced with a
colon :
followed by 0 or more new lines with an indentation level
deeper than the surrounding control statement:
if condition: statement1 statement2 statement3 else: statement4 statement5 statement6 while condition: statementA statementB statementC
Functions use the same pattern. Colon followed by indentation:
def f(x): print(x)
LF as brace
When a block extends over several lines, standard formatting rules require the statements to be indented to a level deeper than the statement surrounding the block:
if cond { say("we are in the if clause") say("and we have checked that ") say("cond is true") } else { say("we are in the else clause") say("hence cond is not true") } say("this is always executed")
or, alternatively
if cond { say("we are in the if clause") say("and we have checked that ") say("cond is true") } else { say("we are in the else clause") say("hence cond is not true") } say("this is always executed")
In both these cases, the braces { }
are redundant with the
indentation. If the parser would treat a LF followed by code with deeper
indentation like an opening brace {
and the corresponding LF
followed by shallower indentation like the corresponding closing
brace }
, the code could become.
if cond say("we are in the if clause") say("and we have checked that ") say("cond is true") else say("we are in the else clause") say("hence cond is not true") say("this is always executed")
Additional blank lines or comments should be ignored regardless of their indentation.
Blocks in a Single Line
Braces can still be used in a single line:
if col = red { if use_rgb { r = 0xff; g = 0x00; b = 0x00 } else { c = 0; m = 1; y = 1; k = 0 } } if col = yellow { if use_rgb { r = 0xff; g = 0xff; b = 0x00 } else { c = 0; m = 0; y = 1; k = 0 } } if col = pink { if use_rgb { r = 0xff; g = 0x80; b = 0xff } else { c = 0; m = 0.5; y = 1; k = 0 } }
If we eagerly parse blocks that do not start with a LF until the EOL, we could even omit the braces here:
if col = red if use_rgb r = 0xff; g = 0x00; b = 0x00 else c = 0; m = 1; y = 1; k = 0 if col = yellow if use_rgb r = 0xff; g = 0xff; b = 0x00 else c = 0; m = 0; y = 1; k = 0 if col = pink if use_rgb r = 0xff; g = 0x80; b = 0xff else c = 0; m = 0.5; y = 1; k = 0
This appears a bit confusing, though.
Disallowing nesting in single lines, like Python, would require the code to look like this:
if col = red if use_rgb r = 0xff g = 0x00 b = 0x00 else c = 0 m = 1 y = 1 k = 0 if col = yellow if use_rgb r = 0xff g = 0xff b = 0x00 else c = 0 m = 0 y = 1 k = 0 if col = pink if use_rgb r = 0xff g = 0x80 b = 0xff else c = 0 m = 0.5 y = 1 k = 0
which makes it hard to see the symmetry and spot bugs that are easier to spot when using one-liners.
It seems best to allow blocks in a single line iff braces are used.
Routine Declaration
The most simple syntax to declare a routine is a feature name followed by a block of statements. A call is just the feature name:
feat1 // declare feat1 as a routine { stmnt1 stmnt2 } feat1 // call feat1
Not using braces but LF with indentation like this
feat1 // declare feat1 as a routine stmnt1 stmnt2 feat1 // call feat1
can turn problematic if the block becomes empty, e.g, by commenting out the statements:
feat1 // !!! this is now a call to feat1 !!! // stmnt1 // stmnt2 feat1 // call feat1
this change turns the feature declaration into a feature call. These kind of semantic changes from commenting a statement are not acceptable, we need a different solution.
Routine Declaration with braces
Requiring braces for any routine declaration solves this:
feat1 { // declare feat1 as a routine // stmnt1 // stmnt2 } feat1 // call feat1
Routine Declaration with keyword
Requiring another keyword like is
would also solve this:
feat1 is // declare feat1 as a routine // stmnt1 // stmnt2 feat1 // call feat1
Empty Blocks
The Fuzion syntax for feature declarations still poses problems using indenting line breaks since there is no way to distinguish an empty block from no block:
feature1 { print("in feature1") } feature2 print("in feature2") feature3 { } feature4 // does nothing feature5 { print("in feature5") }
Here, feature4 would be parsed as a call to feature4, not a declaration, while feature1, feature2, feature3 and feature5 would be parsed as feature declarations.
Requiring is
in feature declarations
This can be solved by requiring a keyword is
or similar:
feature1 is { print("in feature1") } feature2 is print("in feature2") feature3 is { } feature4 is // does nothing feature5 { print("in feature5") }
is
could be made optional if {
is used:
feature1 { print("in feature1") } feature2 is print("in feature2") feature3 { } feature4 is // does nothing feature5 { print("in feature5") }
We could also add a notation such as .
as shortcut code { }
:
feature1 { print("in feature1") } feature2 is print("in feature2") feature3. feature4. // does nothing feature5 { print("in feature5") }
Requiring keyword before feature declarations
Alternatively, we could add a keyword such as feature
(or proc
, function
, etc.) before a feature
declaration:
feature feature1 { print("in feature1") } feature feature2 print("in feature2") feature feature3 { } feature feature4 // does nothing feature5 { print("in feature5") }
Block Structure in Fuzion
The basic idea is to use a block structure based on braces { }
and semicolon ;
as in the C/C++/Java/C# family (and make those
familiar to this style immediately happy), but to have additional rules that
make these symbols optional when line breaks and indentation clearly separate
statements or blocks.
Flat Line Breaks as Statement Delimiters
Let's start with statement sequences: Statements within a block separated
by the statement delimitor. A statement delimitor is either a
semicolon ;
or a line break that maintains the indentation level (a
flat line break) of the statement sequence. So the following statements
sequences are legal and equivalent:
// using semicolon as statement delimitor statement1; statement2; statement3 // using flat line break that maintains indentation level as statement delimitor statement1 statement2 statement3 // mixing semicolon and flat line break as statement delimitor statement1 statement2; statement3 // mixing semicolon and flat line break as statement delimitor statement1; statement2 statement3
Indenting Line Breaks as Block Delimiters
A line break is called indenting if it increases the indentation level, while
a line break that decreases the indentation level is reverse-indenting. With
these definitions, we can now allow line breaks as block delimiters: An opening
curly brace {
is equivalent to an indenting line break, while a
closing curly brace }
is equivalent to a reverse-indenting line
break, so we can define blocks as follows:
// blocks using braces if condition { statement1 statement2 statement3 } else { statement4 statement5 statement6 } while condition { statementA statementB statementC } // blocks using braces in separate lines if condition { statement1 statement2 statement3 } else { statement4 statement5 statement6 } while condition { statementA statementB statementC } // blocks using indentation if condition statement1 statement2 statement3 else statement4 statement5 statement6 while condition statementA statementB statementC // blocks using indentation and semicolons if condition statement1; statement2; statement3 else statement4; statement5; statement6 while condition statementA; statementB; statementC
This syntax is in a sense more radical than Python's syntax. We can do without the colon after the condition, the indenting line break is sufficient.
Single line code
Single line code using braces { }
Using braces and semicolons, it is still possible to create more complex code in a single-line if this appears useful:
// single-line statements using braces and semicolons if condition { statement1; statement2; statement3 } else { statement4; statement5; statement6 } while condition { statementA; statementB; statementC }
Such code is particularly convenient for repeated code pattern that differ only in a few places:
match color | yellow => if rgb { r = 0xff; g = 0xff; b = 0x00 } else { c = 0.0; m = 0.0; y = 1.0; k = 0.0 } | magenta => if rgb { r = 0xff; g = 0x00; b = 0xff } else { c = 0.0; m = 1.0; y = 0.0; k = 0.0 } | cyan => if rgb { r = 0x00; g = 0xff; b = 0xff } else { c = 1.0; m = 0.0; y = 0.0; k = 0.0 } | red => if rgb { r = 0xff; g = 0x00; b = 0x00 } else { c = 0.0; m = 1.0; y = 1.0; k = 0.0 } | green => if rgb { r = 0x00; g = 0xff; b = 0x00 } else { c = 1.0; m = 0.0; y = 1.0; k = 0.0 } | blue => if rgb { r = 0x00; g = 0x00; b = 0xff } else { c = 1.0; m = 1.0; y = 0.0; k = 0.0 } | white => if rgb { r = 0xff; g = 0xff; b = 0xff } else { c = 0.0; m = 0.0; y = 0.0; k = 0.0 } | black => if rgb { r = 0x00; g = 0x00; b = 0x00 } else { c = 0.0; m = 0.0; y = 0.0; k = 1.0 } | grey => if rgb { r = 0x80; g = 0x80; b = 0x80 } else { c = 0.0; m = 0.0; y = 0.0; k = 0.5 }
Single line code using eager parsing to EOL
If the code in a statement sequence cannot be parsed as part of a preceding
expression, we could do without braces and parse statements eagerly until a
keyword such as else
or the end of the line is encountered:
// single-line statements using braces and semicolons if condition statement1; statement2; statement3 else statement4; statement5; statement6 while condition statementA; statementB; statementC
The code above then becomes:
match color | yellow => if rgb r = 0xff; g = 0xff; b = 0x00 else c = 0.0; m = 0.0; y = 1.0; k = 0.0 | magenta => if rgb r = 0xff; g = 0x00; b = 0xff else c = 0.0; m = 1.0; y = 0.0; k = 0.0 | cyan => if rgb r = 0x00; g = 0xff; b = 0xff else c = 1.0; m = 0.0; y = 0.0; k = 0.0 | red => if rgb r = 0xff; g = 0x00; b = 0x00 else c = 0.0; m = 1.0; y = 1.0; k = 0.0 | green => if rgb r = 0x00; g = 0xff; b = 0x00 else c = 1.0; m = 0.0; y = 1.0; k = 0.0 | blue => if rgb r = 0x00; g = 0x00; b = 0xff else c = 1.0; m = 1.0; y = 0.0; k = 0.0 | white => if rgb r = 0xff; g = 0xff; b = 0xff else c = 0.0; m = 0.0; y = 0.0; k = 0.0 | black => if rgb r = 0x00; g = 0x00; b = 0x00 else c = 0.0; m = 0.0; y = 0.0; k = 1.0 | grey => if rgb r = 0x80; g = 0x80; b = 0x80 else c = 0.0; m = 0.0; y = 0.0; k = 0.5
Is this too much?
Complex Statements
Sometimes, it might be necessary to break a statement into several lines to avoid overly long lines, e.g.
if perform_affine_transform (new_x, new_y, new z) := apply_affine_transformation_matrix(old_x, old_y, old_z, affine_matrix(factor_xx, factor_xy, factor_xz, factor_yx, factor_yy, factor_yz, factor_zx, factor_zy, factor_zz), translation_vector(translate_x, translate_y, translate_z))
should better be written like this (it should best be broken up into several statements, but this is not the point here):
if perform_affine_transform (new_x, new_y, new z) := apply_affine_transformation_matrix( old_x, old_y, old_z, affine_matrix (factor_xx, factor_xy, factor_xz, factor_yx, factor_yy, factor_yz, factor_zx, factor_zy, factor_zz), translation_vector(translate_x, translate_y, translate_z))
Additional line breaks within one statement should be allowed provided that:
- The first line break within a statement is indenting.
- Following line breaks are either flat or align with an indenting token such as
(
. - Every indenting token has a corresponding reverse-indenting token such
as
)
that resets the indentation level to the state before the indenting token.
Complex Conditions
A line break within an expression can cause ambiguity for the parser, e.g.
// This cannot be parsed well: if z.real * z.real + z.imaginary * z.imaginary < 2 draw(x,y,black) else draw(x,y,colors[depth % 16])
The problem is that the code before the indenting line break z.real *
z.real
syntactically is a valid expression and the code after the line
break + z.imaginary * z.imaginary < 2
syntactically is a valid
statement, so the parser has no easy way to tell the end of the condition from
the beginning of the statement sequence.
A simple solution is to restrict line breaks within conditions to code within indenting token pairs and apply the same rules as for line breaks within statements.
Then, a legal way to add line breaks in this example would be
if (z.real * z.real + z.imaginary * z.imaginary < 2) draw(x,y,black) else draw(x,y,colors[depth % 16])
or
if 2 > (z.real * z.real + z.imaginary * z.imaginary) draw(x,y,black) else draw(x,y,colors[depth % 16])
but not
// This cannot be parsed well: if (z.real * z.real + z.imaginary * z.imaginary) < 2 draw(x,y,black) else draw(x,y,colors[depth % 16])
Expressions with .
Restricting expressions to a single line would disallow code using cascading calls such as
(1..n) .filter(x => (n % x) = 0) .print
Since no legal statement starts with a .
, we can relax the single-line
restriction by allowing line breaks before .
.
Routine Declarations
To be able to distinguish routines with no code from calls, Fuzion will
require routine declarations to either use braces { }
around the
code of a routine or perform the declaration using is
.
Using is
also gives a nice syntax for the definition of abstract
(deferred in Eiffel-parlance) or intrinsic features
// all of these are equivalent: show1(o Obj) { print(o.as_string) } show2(o Obj) { print(o.as_string) } show3(o Obj) { print(o.as_string) } show4(o Obj) is print(o.as_string) show4(o Obj) is print(o.as_string) show5(o Obj) is { print(o.as_string) } show6(o Obj) is { print(o.as_string) } show7(o Obj) is { print(o.as_string) } // all of these are empty features f1 is f2 is { } f3 { } f4 { } f5 { } // special features as_string is abstract infix +° (o i32) i32 => intrinsic