Graduation Thesis

My graduation thesis was construction and analysis of simple language-oriented development environment. Language-oriented environments are programming environments which are aware of programming language lexical and syntax rules. They analyze program code incrementally, which allows them to recognize language constructs like loops, function declarations, expressions, comments, statements, class declarations and other important program structures. With all that information available during editing, programming environment becomes more interactive and productive. Figure 1-1 shows a simple PASCAL-like program inside a language-oriented environment. Available lexical information is used for lexical coloring of the program code (often wrongly referred as "syntax highlighting"). Available syntax information is used to highlight syntax structure at the current cursor position (eg. FOR-loop language construct).

Figure 1-1: Simple language-oriented development environment

Information about current syntax construct is useful in cases when current syntax is not obvious. That is the case with nested brackets in complex expressions (Figure 1-2), ambiguous IF-statements (mismatched ELSE problem) and nested program structures.

Figure 1-2: Matching brackets in complex expression

Synchronous information about program syntax structure allows immediate detection of syntax errors. Figure 1-3 and Figure 1-4 show examples of errors which are immediately detected in language-oriented environment (errors are underlined in red). In normal environments such errors are not discovered until compilation time.

Figure 1-3: Extra operator syntax error

Figure 1-4: Bad function call syntax error

Although demonstrated environment looks quite simple and with simple set of features, the technology behind the scenes is quite complex. Incremental lexical and syntax analysis require much more complex algorithms than one-time linear analysis. What is even worse, efficient incremental syntax analysis of some languages is still an unsolved problem in scientific literature. For more information about efficiency of implemented incremental syntax analyzer, see seminar project.

Extensibility

Constructed environment can be used for a wide class of programming languages. Incremental lexical and syntax analyzers constructed for this graduation thesis use widely adopted formalisms for describing computer language structure. Lexical properties are described with finite state machine, which is easily constructed from standard regular expressions. Syntax properties are described with LR(1) compatible grammar. Both lexical and syntax data is stored in textual files loaded by the environment at startup time. Unfortunately, because of irregular and awkward language definition, this environment cannot be used for C and C++ computer languages.

Download

Demonstrated language-oriented environment is built for Windows 2000 or latter operating systems. If you wish to compile the source code, you will need the MS Visual C++ 6.0. Here are the files:

Part of the project was modeled in UML using Sparx Systems Enterprise Architect 3.10. UML class model of Document/View arhitecture can be found here.