Seminar Project
My seminar project was Dynamic
Rebuilding of Syntax Tree and Appliance in Construction of Combined
Language Processor. This project received the principal's award at
the University of Zagreb. The purpose of the project was to construct
algorithms and data structures for efficient incremental syntax
analysis.
Figure 2-1 shows the main windows of the demonstration program
which rebuilds the syntax tree for the code written in simple expression
language.
Figure 2-1: Dynamic
Rebuilding of The Syntax Tree
The main idea behind incremental algorithm is to
repair existing syntax tree by replacing just the small part which is
invalid due to recent changes in source code (keystrokes, editing
operations). Existing linear algorithms for parsing (in this case
LR(1) parsing algorithm) are not acceptable simply because they
analyze the whole source file, which is unacceptably slow for an environment
which needs to be interactive.
Unfortunately, most computer languages were never created with
incremental properties in mind. Because of that most of them have
language constructs which are hard to incrementally analyze.
For instance,
multi-line comments (like /* */ in C language) present a problem
because of their non-local nature. When the user opens a multi-line
comment (types "/*") the rest of the input file represents
an invalid
syntax (as there is no comment closing). When the user closes
a
multi-line comment (types "*/") the rest of the input file
again becomes a valid syntax structure. The
result is that the unoptimized incremental syntax analyzer parses the whole
input file every time the user types multi-line comments. C and C++ are particularly difficult to implement in
an incremental environment.
This is due to the quite awkward lexical and syntax language constructs.
For instance, C and C++ standards define a concept of the language preprocessor. Macro defined
with a language preprocessor defines text replacements in all source
files which use that macro. If the user changes a macro definition, that
modification implies lexical and syntax change in many other files,
which often requires reparsing of the entire set of input files. Except
with the preprocessor, C and C++ have issue with ambiguous syntax rules. For
instance, the following line in C++:
can represent two syntax structures:
- declaration of variable
a with type
myfile
(if myfile
is a user defined structure), or
- function call with variable
a
as function argument (if
myfile is a
user function).
In order to determine the exact context,
the parser needs
semantic information -- which makes incremental analysis much more
complex. Efficient methods for incremental analysis of different
programming languages are still a subject of discussion in scientific
literature.Some other languages are much more suitable for
incremental analysis. For instance, BASIC was created with
incrementality in mind (line-based interpreter) and represents a
good language choice for language-oriented environments. Incremental
engine developed inside this project was used in latter
graduation thesis, which has the most
recent version of incremental syntax analyzer and gives a
demonstration of the concrete usage of this engine. |