Frequently asked questions
General
- Why a new programming language?
- What are the design principles of Seed7?
- What is an extensible programming language?
- Are Seed7 programs portable?
- Which license does Seed7 use?
- Is Seed7 a descendant of Pascal?
- How does Seed7 compare to Java?
- What kind of programs can be written in Seed7?
- How many lines of code are in the Seed7 project?
- On which operating systems does Seed7 run?
- Is there an installer for Seed7?
- Where can I download Seed7?
- How do I uncompress the *.tgz file from the release?
- How do I compile the Seed7 interpreter?
- I got errors when compiling Seed7. What should I do?
- How do I verify that the interpreter works correctly?
- How can I install Seed7?
- How can I use the Seed7 interpreter?
- Is it possible to compile Seed7 programs?
- Is there support for syntax highlighting?
- Can I debug Seed7 programs with Eclipse?
- Can I profile Seed7 programs?
Language
- What are the reserved words of Seed7?
- How is the syntax of Seed7 defined?
- Why does Seed7 not use the C statements like C++ and Java?
- Why is the type: name declaration syntax used?
- What makes code readable?
- Isn't the code unreadable if everybody invents new statements?
- Hasn't Lisp already user defined statements and operators?
- Why does Seed7 use static type checking?
- Is the program development slowed down with static type checking?
- Does static type checking speed up programs?
- How can static type checking work if types are first-class objects?
- Why does Seed7 not use type inference?
- Why does a local type declaration trigger an error?
- Why does a two-dimensional array trigger an error?
- Are there automatic casts to the right type?
- Is Seed7 a "do what I mean" language?
- Can I use something and declare it later?
- Can functions be overloaded?
- Can I overload two functions which just differ in the result type?
- Can functions have variable parameter lists?
- Why does the initialization use the keyword 'is' instead of ':=' ?
- Is there an elegant way to initialize data?
- Why is it necessary to initialize all variables?
- Are there types like byte, small and long?
- Is Unicode supported?
- Why are strings in Seed7 mutable?
- Why are strings indexed from one?
- How are comparisons done in Seed7?
- Can Seed7 access databases?
- Are there regular expressions?
- What are scanner functions?
- Why is the div operator used for integer divisions?
- Why are & and <& defined for string concatenation?
- How is the number format specified when writing a number?
- What types of parameters does Seed7 have?
- What is an 'in' parameter?
- Is there an example where val and ref parameters have different behavior?
- What is call-by-name?
- Why are functions declared with const?
- Are there functions declared without const?
- What is an integer overflow?
- Why are integers not promoted to bigInteger when they overflow?
- Is there a garbage collection?
- Is Seed7 object oriented?
- Is everything inherited from object?
- What is the difference between overloading and object orientation?
- What is an abstract data type?
- What is multiple dispatch?
- What container classes do exist?
- Are there primitive types?
- What is the difference between object and primitive types?
- When to use an object type and when a primitive type?
- How does the assignment work?
- Why are there two forms of assignment?
- Where are the constructors?
- Are there static methods / class methods?
- Are there generics / templates?
- Is the parser part of the run-time library?
- Can I access the abstract syntax tree (AST)?
- What restrictions does Seed7 have?
- What does the term undefined behavior mean?
- What does the term memory safety mean?
- Are there exceptions?
- What happens if an exception is not caught?
- Why does a write statement raise RANGE_ERROR?
- Is there a return statement?
- Why are break and continue not supported?
- How to define break and continue?
Implementation
- How is Seed7 parsed?
- What is link time optimization?
- Can Seed7 compile to a dll/so?
- Where does the interpreter look for include libraries?
- How is the directory of the predefined include libraries determined?
- What happens during make depend?
- How does the Seed7 compiler get information about C compiler and runtime?
- What should a binary Seed7 package install?
- What is necessary to compile Seed7 with database connections?
- How to fix the error "Searching dynamic libraries failed"?
- Does the interpreter use bytecode?
- How does the analyze phase of the interpreter work?
- How does the compiler implement call-by-name parameters?
- What does action "XYZ_SOMETHING" mean?
- Why are there dollar signs in some places?
- Why does "seed7_05.s7i" contain a version number?
- Can I use an "abc.s7i" include file to boot to the abc language?
Why a new programming language?
Because Seed7 has several features which are not found in other programming languages:
- The possibility to declare new statements (syntactical and semantically) in the same way as functions are declared (There are also user definable operators with priority and associativity).
- Declaration constructs for constant-, variable-, function-, parameter-, and other declarations are described in Seed7 (The user can change existing declaration constructs or invent new ones).
- Templates use no special syntax. They are just functions with type parameters or a type result.
- Seed7 has abstract data types. For example the types array, hash, struct and set. They are not hard coded in the compiler but are abstract data types written in Seed7. User defined abstract data types are possible as well.
- The object orientation of Seed7 allows multiple dispatch. That means that a function or method is connected to more than one type.
- Seed7 is a syntactically and semantically extensible language: Almost all of the Seed7 language (statements, operators, declaration constructs, and more) is defined in Seed7 in an include file (seed7_05.s7i).
- The application program contains an include statement and the s7 interpreter is booted with the language description when it starts. This way it is possible to define language variants or a totally different language.
What are the design principles of Seed7?
The design principles are:
- Can interpret scripts or compile large programs:
- The interpreter starts quickly. It can process 400000 lines per second. This allows a quick edit-test cycle. Seed7 can be compiled to efficient machine code (via a C compiler as back-end). You don't need makefiles or other build technology for Seed7 programs.
- Error prevention:
- Seed7 is statically typed, memory safe, variables must always have a value, there are no pointers and there is no NULL. All errors, inclusive integer overflow, trigger an exception.
- Source code portability:
- Most programming languages claim to be source code portable, but often you need considerable effort to write portable code. In Seed7 it is hard to write unportable code. Seed7 programs can be executed without changes. Even the path delimiter (/) and database connection strings are standardized. Seed7 has drivers for graphic, console, etc. to compensate for different operating systems.
- Readability:
- Programs are more often read than written. Seed7 uses several approaches to improve readability.
- Well defined behavior:
- Seed7 has a well-defined behavior in all situations. Undefined behavior like in C does not exist.
- Overloading:
- Functions, operators and statements are not only identified by identifiers but also via the types of their parameters. This allows overloading the same identifier for different purposes.
- Extensibility:
- Every programmer can define new statements and operators. This includes new operator symbols. Even the syntax and semantics of Seed7 is defined in libraries.
- Object orientation:
- There are interfaces and implementations of them. Classes are not used. This allows multiple dispatch.
- Multiple dispatch:
- A method is not attached to one object (this). Instead, it can be connected to several objects. This is like the overloading of functions.
- Performance:
- Seed7 is designed to allow compilation to efficient machine code. Several high level optimizations are also done.
- No virtual machine:
- Seed7 is based on the executables of the operating system. This removes another dependency.
- No artificial restrictions:
- Historic programming languages have a lot of artificial restrictions. In Seed7 there is no limit for length of an identifier or string, for the number of variables or number of nesting levels, etc.
- Independent of databases:
- A database independent API supports the access to SQL databases. The database drivers of Seed7 consist of 30000 lines of C. This way many differences between databases are abstracted away.
- Possibility to work without IDE:
- IDEs are great, but some programming languages have been designed in a way that makes it hard to use them without IDE. Programming language features should be designed in a way that makes it possible to work with a plain text editor.
- Minimal dependency on external tools:
- To compile Seed7 you just need a C compiler and a make utility. The Seed7 libraries avoid calling external tools as well.
- Comprehensive libraries:
- The libraries of Seed7 cover many areas.
- Own implementations of libraries:
- Many languages have no own implementation for essential library functions. Instead C, C++ or Java libraries are used. In Seed7 most of the libraries are written in Seed7. This reduces the dependency on external libraries. The source code of external libraries is sometimes hard to find and in most cases hard to read.
- Reliable solutions:
- Simple and reliable solutions are preferred over complex ones that may fail for assorted reasons.
What is an extensible programming language?
An extensible programming language supports mechanisms to extend the programming language, compiler/interpreter and runtime environment. The programmer is allowed to define new language constructs such as statements, declaration constructs and operators syntactically and semantically. Most programming languages allow user defined variables, functions and types, but they also use constructs which are hard-coded in the compiler/interpreter. An extensible programming language tries to avoid such hard-coded constructs in normal programs.
Extensible programming was an area of active research in the 1960s, but in the 1970s the extensibility movement was displaced by the abstraction movement. Today's software history gives almost no hint that the extensible languages movement had ever occurred. In the historical movement an extensible programming language consisted of a base language providing elementary computing facilities, and a meta-language capable of modifying the base language. A program then consisted of meta-language modifications and code in the modified base language. A popular approach to do language extension was the use of macro definitions. The constructs of the base language were hard-coded.
The design and development of Seed7 is based on independent research, which was done without knowing that the historic extensible programming language movement existed. Although Seed7 has different roots it reaches many of the original extensible programming language goals. Contrary to the historic movement Seed7 does not have a meta-language. In Seed7 a language extension is formulated in Seed7 itself. Seed7 differentiates between syntactic and semantic extensions. Syntactic extensions are described in Chapter 9 (Structured syntax definition) of the manual. The semantic extensions of Seed7 are done by declaring statements and operators as functions. For the body of loops and similar needs statically typed call-by-name parameters are used.
Are Seed7 programs portable?
Yes. Seed7 spares no effort to support source code portability. No changes are necessary, if programs are moved between different processors, between 32- and 64-bit systems or between little- and big-endian machines. Seed7 source code can also be moved between different operating systems. Several driver libraries assure that the access to operating system resources such as files, directories, network, clock, keyboard, console and graphics is done in a portable way. The libraries of Seed7 cover many areas. The goal is: There should be no need to call foreign C functions, or to execute shell (respectively cmd.exe) commands.
- Seed7 determines the properties of the underlying C compiler and C runtime library and uses code to compensate the differences.
- Different Unicode encodings (e.g.: UTF-8 or UTF-16) in system calls (e.g. fopen()/wopen()) are hidden from the programmer.
- Portable file functions are provided in the library osfiles.s7i:
- There are functions to copy, move and remove files (and directory trees).
- File properties such as size, type, time and mode can be obtained and changed.
- The contents of a directory can be read as array of strings or via the file interface.
- A standard path representation removes all problems with drive letters and different path delimiters.
- Differences between Unix sockets and winsockets are hidden and a Seed7 socket is a file (as in Unix). The type pollData allows to wait until a socket is ready to read or write data. TLS/SSL and higher level protocols such as HTTP, HTTPS, FTP and SMTP are also supported.
- The library keybd.s7i defines the file KEYBOARD, which supports reading single key presses. The characters read from KEYBOARD are not echoed to the console and there is no need to press ENTER. There is also a portable way to check, if a key has been pressed.
- Reading keys and key combinations such as ctrl-F1 from a text console or a graphic window under different operating systems always delivers the same character code.
- There is an access to the text console, which allows cursor positioning.
- An operating system independent type for times and dates, based on the proleptic Gregorian calendar, is provided in the library time.s7i.
- A portable graphics library allows drawing, image operations, windows manipulation and bitmap fonts. Events to redraw a window and other annoyances are managed in the graphics library.
- A database library provides a database independent API to connect to MySQL, MariaDB, SQLLite, PostgreSQL, Oracle, Firebird, Interbase, Db2, Informix and SQL Server databases. The ODBC interface can be used as well.
- Weaknesses of operating systems are hidden (E.g.: The windows function utime() does not work on directories, but Seed7 allows the modification of directory access and modification times also under windows).
Which license does Seed7 use?
Seed7 is "Free as in Freedom" and not only "Free as in Free Beer". The s7 interpreter and the example programs (extension .sd7) are under the GPL (General Public License, seen in the file COPYING).
The Seed7 runtime library is under the LGPL (Lesser General Public License, seen in the file LGPL). The Seed7 include files (extension .s7i) are a part of the Seed7 runtime library.
Seed7 allows the interpretation and compilation of programs with any license. There is no restriction on the license of your Seed7 programs.
For the development of the Seed7 compiler it will be necessary to move some source code from the s7 interpreter (under GPL) to the Seed7 runtime library (under LGPL). This will only be done to for the Seed7 runtime library and only as far as necessary to make no restriction on the license of compiled Seed7 programs.
If you send me patches (I would be very pleased), it is assumed that you accept license changes from GPL to LGPL for parts of code which need to be in the runtime library to support compilation of Seed7 programs.
Is Seed7 a descendant of Pascal?
No, not really. The keywords and statements remind people of Pascal, but behind the surface there is much difference. Don't judge a book by its cover. Seed7 is neither limited to Pascal's features, nor is it implemented like Pascal. Notable differences are:
Feature Standard Pascal Seed7 syntax hard-coded in the compiler defined in a library statements hard-coded in the compiler defined in a library operators hard-coded in the compiler defined in a library array hard-coded in the compiler defined as abstract data type array record / struct hard-coded in the compiler defined as abstract data type hash table not in the standard library defined as abstract data type hash compiler target machine code or P-code C, compiled to machine code afterwards template none function with type parameters abstract data type none function with type result object orientation none interfaces and multiple dispatch
Except for LL(1) parsing, no technology used by classical Pascal compilers could be used to implement Seed7.
How does Seed7 compare to Java?
Several features of Seed7 are missing in Java:
Features missing in Java Comment Stand alone functions Singletons must be used instead Call-by-reference parameters All parameters are call-by-value Call-by-name parameters All parameters are call-by-value Operator overloading In Java it is necessary to write a.add(b.multiply(c)) instead of a + b * c. User defined operators - User defined statements - User defined syntax - One operator to check for equality For POD types Java uses == and for strings name.equals(""). Elegant way to express data structures Property files and XML must be used instead User defined functions to initialize data - Multiple dispatch - Checking for integer overflow The result is modulo a power of two without any indication that this is wrong. Escape sequences only as part of literals Unicode escapes can be everywhere. That can cause unexpected effects
What kind of programs can be written in Seed7?
Seed7 can be used in various application areas:
- Applications, like an Excel look-alike (Seed7 is a general purpose language and programs can be compiled to an executable).
- Scripts that deal with files and directories (The Seed7 Homepage and the Seed7 release are created with Seed7 scripts).
- Tools for networking (There is support for sockets, TLS/SSL, listeners, HTTP, HTTPS, FTP, SMTP and HTML parsing). E.g.:
- Programs that deal with XML (There is support for XML parsing and the possibility to read XML into a DOM data structure).
- CGI scripts (A CGI support library is available and the Comanche web server can be used to test CGI scripts).
- Programs that use the browser as user interface:
- As language to describe algorithms.
- Command line utilities. E.g.:
- 2D games. E.g.:
- Simulations. E.g.:
- Functions to explore mathematics. E.g.:
How many lines of code are in the Seed7 project?
The Seed7 package contains more than 100000 lines of C and more than 400000 lines of Seed7. For version 2024-08-12 the number of lines is:
183664 C source files (*.c) 13770 C header files (*.h) 259238 Seed7 source files (*.sd7) 190664 Seed7 library/include files (*.s7i)
C code (*.c and *.h files) can be divided into the following areas:
0.3% Interpreter main 11.6% Parser 2.8% Interpreter core 24.7% Primitive action functions 7.4% General helper functions 48.5% Runtime library 4.7% Compiler data library
Details about these files can be found in the file
On which operating systems does Seed7 run?
Seed7 runs on the following operating systems:
- Linux is supported with the following compilers:
- gcc (the development is done using gcc under Linux)
- clang
- icc
- tcc
- Unix (I also used Seed7 under various Unix variants, so it is probably easy to port Seed7 to a Unix variant)
- BSD (there is a FreeBSD port and an OpenBSD port)
- Windows is supported with the following compilers:
- MinGW GCC (the binary Windows release of Seed7 uses MinGW)
- Cygwin GCC (the X11 graphics needs Cygwin/X)
- clang
- MSVC cl.exe (cl.exe is the stand-alone compiler of MSVC)
- tcc
- BDS bcc32.exe (bcc32.exe is the stand-alone compiler of the BDS)
- DOS (uses DJGPP. Sockets, graphics, processes and databases are currently not supported)
- macOS is supported with the following compilers:
- gcc
- clang
For other operating systems it might be necessary to write driver modules for
screen (=text console), graphics, time or other aspects of Seed7. The package
contains various older driver modules which are not up to date, but can be used
as base to write such driver modules. For more detailed information look at the
files
Is there an installer for Seed7?
A Seed7 installer for Windows can be downloaded from:
This directory contains the latest installer and older ones. Installers have names with the following pattern:
seed7_05_yyyymmdd_win.exe
Just download the installer with the latest date (yyyy-mm-dd). It is not a problem, if the installer is older than the latest source release of Seed7. The installer is capable to download the latest source release. After you have downloaded the installer you can start it (either from the console (cmd.exe) or from the Windows Explorer).
The installer leads through the installation process with a dialog. It determines the latest source release of Seed7 and downloads it. If the latest release cannot be downloaded a manually downloaded source release can be used instead. The installer can also use a built-in release of Seed7. This built-in release is the one with the date of the installer.
The installer asks for an installation directory for Seed7. Afterwards it compiles Seed7 with the makefile seed7/src/mk_mingc.mak. The installer uses a built-in make utility and an encapsulated gcc. These tools do not interfere with another make or gcc, which might be installed on your computer.
Finally, the installer adds the directory with the Seed7 executables to the search path (PATH variable). Therefore, it needs administrator rights. The program to change the path is setwpath.exe. The name setwpath.exe will show up, when you are asked to allow administrative rights for the installation.
The installer can be used to update an existing Seed7 installation. The installer checks the version of an existing installation of Seed7 and offers the possibility to update. Update means that all files in the Seed7 installation directory are replaced. Therefore, it makes sense to place your own Seed7 programs and libraries at a different place.
Where can I download Seed7?
The latest source code release of Seed7 can be downloaded from:
Just click on the button Download Latest Version .
Other source code releases can be found in the directory seed7. It is strongly recommended to use the latest version. An installer for Windows can be found in the directory bin. Other executables are also in the bin directory.
Seed7 is now available at GitHub as well. You can use the command:
git clone https://github.com/ThomasMertes/seed7.git
to clone the Seed7 repository.
After downloading the Seed7 source code the interpreter can be compiled.
How do I uncompress the *.tgz file from the release?
If you have a gnu 'tar' program available you can just do
$ tar -xvzf seed7_05_yyyymmdd.tgz
If your 'tar' command does not accept the 'z' option you need to uncompress the file first with 'gunzip':
$ gunzip seed7_05_yyyymmdd.tgz $ tar -xvf seed7_05_yyyymmdd.tar
Sometimes the browser downloads a *.gz file instead of a *.tgz file. In that case, you could also use 'gunzip' as shown above. As an alternative, you can also use 'zcat':
$ zcat seed7_05_yyyymmdd.gz > seed7.tar $ tar -xvf seed7.tar
Under windows you can use the 7-Zip compression/decompression utility (there is no relationship to Seed7). 7-Zip is open-source software and is available at: www.7-zip.org.
How do I compile the Seed7 interpreter?
There is a detailed description how to build Seed7 in
The way to compile the interpreter depends on the operating system and the development tools used. You need a stand-alone C compiler and a make utility to compile the interpreter. A C compiler, which is only usable from an IDE, is not so useful, since some Seed7 programs (e.g. The Seed7 compiler s7c) need to call the C compiler as well.
In case a make utility is missing the program make7 can be used instead. You can download make7.exe, which is a binary version of make7 for Windows.
To compile the interpreter under Linux just go to the
make depend make
For other cases, several makefiles are prepared for various combinations of operating system, make utility, C compiler and shell:
makefile name operating system make prog C compiler shell mk_linux.mak Linux/Unix/BSD (g)make gcc sh mk_clang.mak Linux/Unix/BSD (g)make clang sh mk_icc.mak Linux/Unix/BSD (g)make icc sh mk_tcc_l.mak Linux/Unix/BSD (g)make tcc sh mk_cygw.mak Windows (Cygwin) (g)make gcc sh mk_msys.mak Windows (MSYS) mingw32-make gcc sh mk_mingw.mak Windows (MinGW) mingw32-make gcc cmd.exe mk_nmake.mak Windows (MinGW) nmake gcc cmd.exe mk_msvc.mak Windows (MSVC) nmake cl cmd.exe mk_bcc32.mak Windows (bcc32) make bcc32 cmd.exe mk_bccv5.mak Windows (bcc32) make bcc32 V5.5 cmd.exe mk_clangw.mak Windows (clang) (g)make clang cmd.exe mk_tcc_w.mak Windows (tcc) (g)make tcc cmd.exe mk_djgpp.mak DOS (g)make gcc cmd.exe mk_osx.mak macOS make gcc sh mk_osxcl.mak macOS make clang sh mk_freebsd.mk FreeBSD make clang/gcc sh mk_emccl.mak Linux/Unix/BSD make emcc + gcc sh mk_emccw.mak Windows (emcc) mingw32-make emcc + gcc cmd.exe
In the optimal case you just copy one makefile from above to 'makefile' and do (with the corresponding make program):
make depend make
When the interpreter is compiled successfully, the executable and the libraries are placed in the 'bin' directory. Additionally, a symbolic link to the executable is placed in the 'prg' directory (Under Windows symbolic links are not supported, so a copy of the executable is placed in the 'prg' directory). The Seed7 compiler (s7c) is compiled with:
make s7c
The compiler executable is copied to the 'bin' directory. If you do several compilation attempts in succession you need to do
make clean
before you start a new attempt.
I got errors when compiling Seed7. What should I do?
In most cases errors indicate that some development package of your distribution is missing. If your operating system is Linux, BSD or Unix not all development packages with header files might be installed. In this case you get some errors after typing 'make depend'. Errors such as
chkccomp.c:56:20: fatal error: stdlib.h: No such file or directory s7.c:30:20: fatal error: stdlib.h: No such file or directory
indicate that the development package of the C library is missing. I don't know the name of this package in your distribution (under Ubuntu it has the name libc6-dev and under openSUSE Tumbleweed the name is glibc-devel), but you can search in your package manager for C development libraries and header files.
Errors such as
con_inf.c:54:18: error: term.h: No such file or directory kbd_inf.c:53:18: error: term.h: No such file or directory trm_inf.c:47:18: error: term.h: No such file or directory
indicate that the curses or ncurses development package is missing. I don't know the name of this package in your distribution (under Ubuntu it has the name libncurses5-dev and under openSUSE Tumbleweed the name is ncurses-devel), but you can search in your package manager for a curses/ncurses package which mentions that it contains the header files. To execute programs, you also need to install the non-developer package of curses/ncurses (in most cases it will already be installed because it is needed by other packages).
Errors such as
drw_x11.c:38:19: error: X11/X.h: No such file or directory drw_x11.c:39:22: error: X11/Xlib.h: No such file or directory drw_x11.c:40:23: error: X11/Xutil.h: No such file or directory drw_x11.c:45:24: error: X11/keysym.h: No such file or directory
indicate that the X11 development package is missing. Under Ubuntu this package has the name libx11-dev and is described as:
X11 client-side library (development headers)
Under openSUSE Tumbleweed this package is named libX11-devel and is described as:
Development files for the Core X11 protocol library
Note that under X11 'client' means: The program which wants to draw. A X11 'server' is the place where the drawings are displayed. So, you must search for a X11 client developer package with headers. If you use X11 in some way (you don't do everything from the text console) the non-developer package of X11 will already be installed.
Errors such as
gcc chkccomp.c -o chkccomp chkccomp.c:28:10: fatal error: base.h: No such file or directory compilation terminated.
or
del version.h process_begin: CreateProcess(NULL, del version.h, ...) failed. make (e=2): The system cannot find the file specified. mingw32-make: *** [clean] Error 2
indicate that your makefile contains commands for the cmd.exe
(or command.com) windows console, but your 'make' program uses
a Unix shell (
Errors such as
s7.c:28:21: error: version.h: No such file or directory
indicate that you forgot to run 'make depend' before running 'make'. Since such an attempt produces several unneeded files it is necessary now to run 'make clean', 'make depend' and 'make'.
If you got other errors I would like to know about. Please send a mail with detailed information (name and version) of your operating system, distribution, C compiler, the version of Seed7 you wanted to compile and the complete log of error messages to seed7-users@lists.sourceforge.net .
How do I verify that the interpreter works correctly?
A comprehensive test of the s7 interpreter and the s7c compiler can be done in the directory prg with the command:
./s7 chk_all
Under windows using ./ might not work. Just omit the ./ and type:
s7 chk_all
The program chk_all uses several check programs to do its work. First a check program is interpreted, and the output is compared to a reference. Then the program is compiled and executed, and this output is also checked. Finally, the C code generated by the compiled compiler is checked against the C code generated by the interpreted compiler. The checks of the compiler are repeated with several compiler options. If everything works correctly the output is (after the usual information from the interpreter):
compiling the compiler - okay chkint ........... okay chkovf ........... okay chkflt ........... okay chkbin ........... okay chkchr ........... okay chkstr ........... okay chkidx ........... okay chkbst ........... okay chkarr ........... okay chkprc ........... okay chkbig ........... okay chkbool ........... okay chkenum ........... okay chktime ........... okay chkscan ........... okay chkjson ........... okay chkbitdata ........... okay chkset ........... okay chkhsh ........... okay chkfil ........... okay chkexc ........... okay
This verifies that interpreter and compiler work correctly.
How can I install Seed7?
After Seed7 interpreter and compiler have been compiled and verified they can be installed. The makefiles support the target install. You need appropriate privileges to do the installation. Depending on the operating system there are different strategies to get the privileges:
Unix-like operating systems
Just go to the directory
seed7/src and type:sudo make install
With the make command of your computer. The sudo command will ask you for your password. If your permissions are sufficient the command creates symbolic links in the directory
/usr/local/bin .Windows
You need to open a console as administrator. Then you can go to the directory
seed7/src and type:make install
With the make command of your computer. This adds the directory
seed7/bin to the search path (PATH variable). You need to start a new console to see the effect of this change.
More details can be found in the file
How can I use the Seed7 interpreter?
The s7 interpreter is called with the command
s7 [options] sourcefile [parameters]
Note that the 'options' must be written before the 'sourcefile'. If the 'sourcefile' is not found .sd7 is appended to the 'sourcefile' and searched for that file.
The following options are recognized by s7:
- -? or -h Write Seed7 interpreter usage.
- -a Analyze only and suppress the execution phase.
- -dx Set compile time trace level to x. Where x is a string consisting
of the following characters:
- a Trace primitive actions
- c Do action check
- d Trace dynamic calls
- e Trace exceptions and handlers
- h Trace heap size (in combination with 'a')
- s Trace signals
- -d Equivalent to -da
- -i Show the identifier table after the analysis phase.
- -l Add a directory to the include library search path (e.g.: -l ../lib).
- -p Specify a protocol file, for trace output (e.g.: -p prot.txt).
- -q Compile quiet. Line and file information and compilation statistics are suppressed.
- -s Deactivate signal handlers.
- -tx Set runtime trace level to x. Where x is a string consisting
of the following characters:
- a Trace primitive actions
- c Do action check
- d Trace dynamic calls
- e Trace exceptions and handlers
- h Trace heap size (in combination with 'a')
- s Trace signals
- -t Equivalent to -ta
- -vn Set verbosity level of analysis phase to n. Where n is one
of the following characters:
- 0 Compile quiet (equivalent to -q)
- 1 Write just the header with version information (default)
- 2 Write a list of include libraries
- 3 Write line numbers, while analyzing
- -v Equivalent to -v2
- -x Execute even if the program contains errors.
In the program the 'parameters' can be accessed via argv(PROGRAM). The function argv(PROGRAM) delivers an array of strings. The number of parameters is 'length(argv(PROGRAM))' and 'argv(PROGRAM)[1]' returns the first parameter.
Is it possible to compile Seed7 programs?
Generally, Seed7 is designed to allow the compilation to machine code. The Seed7 compiler (s7c) is written in Seed7. It uses the analyze phase of the interpreter to convert a program to call-code and then generates a corresponding C program. This C program is compiled and linked afterwards. So Seed7 compiles to efficient machine code via C. The intermediate C code is viewed as portable assembler. It is not intended for human readers. The Seed7 compiler can be called with:
s7c [ options ] source
Possible options are
- -? Write Seed7 compiler usage.
- -On Tell the C compiler to optimize with level n (n is between 1 and 3).
- -O Equivalent to -O1
- -S Specify the stack size of the executable (e.g.: -S 16777216).
- -b Specify the directory of the Seed7 runtime libraries (e.g.: -b ../bin).
- -c Specify configuration (C compiler, etc.) to be used (e.g.: -c emcc).
- -e Generate code which sends a signal, if an uncaught exception occurs. This option allows debuggers to handle uncaught Seed7 exceptions.
- -flto Enable link time optimization.
- -g Tell the C compiler to generate an executable with debug information. This way the debugger will refer to Seed7 source files and line numbers. To generate debug information which refers to the temporary C program the option -g-debug_c can be used.
- -l Add a directory to the include library search path (e.g.: -l ../lib).
- -ocn Optimize generated C code with level n. E.g.: -oc3 The level n is a digit between 0 and 3:
- -p Activate simple function profiling.
- -sx Suppress checks specified with x. E.g.: -sr or -sro
The checks x are specified with letters from the following list:
- d Suppress the generation of checks for integer division by zero.
- i Suppress the generation of index checks (e.g. string, array).
- o Suppress the generation of integer overflow checks.
- r Suppress the generation of range checks.
- -tx Set runtime trace level to x. Where x is a string consisting
of the following characters:
- e Trace exceptions and handlers
- f Trace functions
- s Trace signals
- -wn Specify warning level n. E.g.: -w2
The level n is a digit between 0 and 2:
- 0 Omit warnings.
- 1 Write normal warnings (default).
- 2 Write warnings for raised exceptions.
Is there support for syntax highlighting?
Syntax highlighting is available for several editors:
- For vim there are seed7/doc/seed7.vim and seed7/doc/sd7.vim.
To install syntax highlighting for vim under Linux go to the
seed7/doc directory and do:mkdir -p ~/.vim/syntax mkdir -p ~/.vim/ftdetect cp seed7.vim ~/.vim/syntax cp sd7.vim ~/.vim/ftdetect
On some computers syntax highlighting is turned off by default. In this case it is necessary to create a .vimrc file in the home directory and enter this line:syntax on
For the Mac platform this is described here. - For the Nano editor there is seed7/doc/seed7.nanorc. You can copy
this file to
/usr/share/nano (you might need superuser privileges to do that). - For Notepad++ there is seed7/doc/seed7udl.xml. Start Notepad++ and go to Language ⇒ User Defined Language ⇒ Define your language..., click on Import..., select the seed7/doc/seed7udl.xml file and press Open. Afterwards close and restart Notepad++.
- For UltraEdit there is seed7/doc/seed7.uew. To install syntax highlighting for UltraEdit you must first determine the full directory path for wordfiles: Start UltraEdit and look at Advanced ⇒ Settings (or Configuration) ⇒ Editor display ⇒ Syntax highlighting. In the right part of the popup is the field "Full directory path for wordfiles". Copy the file seed7/doc/seed7.uew to the directory for wordfiles. Afterwards Seed7 can be found under Coding ⇒ Add another language...
- For Textpad there is seed7/doc/seed7.syn.
The file
Can I debug Seed7 programs with Eclipse?
Yes, Eclipse can be easily configured to work with Seed7:
- Eclipse needs the C/C++ Development Tools (CDT) plugin.
- In Window ⇒ Preferences ⇒ C/C++ ⇒ FileTypes add *.sd7 as C Source File and *.s7i as C Header File.
- In Window ⇒ Preferences ⇒ General ⇒ Editors ⇒ File Associations add *.sd7 and *.s7i and make the C/C++ Editor default for them.
- Create a C project (New ⇒ Project ⇒ C/C++ ⇒ C Project).
- Right click on the project in the Project Explorer and select Properties ⇒ C/C++ General ⇒ File Types. Check if the workspace settings are used. If not add *.sd7 as C Source File and *.s7i as C Header File.
- In a console window: Compile your Seed7 program with the option -g.
- In Eclipse: Create a Debug Configuration with: Run ⇒ Debug Configurations... Right click on C/C++ Application and select New.
- In the Main tab of the new Debug Configuration: Press the Browse... button below C/C++ Application to select the executable you want to debug.
- Select the C project you created.
- In the Arguments tab you can set program arguments and current working directory.
- Press the Debug button to start the program.
Can I profile Seed7 programs?
You can use tools that profile executables such as Valgrind. Additionally the Seed7 compiler supports simple function profiling. You just need to compile a program with the option -p. If you execute this program it writes profiling data to the file profile_out, when it is finished. In "profile_out" you find a tab-separated table with microseconds, number of calls, place of the function and function name.
What are the reserved words of Seed7?
In Seed7 there are no reserved words. Instead, there are keywords which are used at various places. Some keywords introduce statements or other constructs (such as declarations). E.g.: The keywords if, while, repeat, for, and some others introduce statements. Other keywords like do, then, range, result, etc. are used in the middle of statements (or other constructs). Finally, there are also keywords like div, rem, lpad, times, etc. which are used as operator symbols.
Seed7 uses syntax declarations to specify the syntax of statements. A keyword is a name which is used somewhere in a syntax declaration. Syntax declarations reduce the possibilities to use a keyword out of context. E.g.: After the keyword if the parser expects always an expression. This makes if unusable as variable name. This way you get error messages when you try to use if or other keywords as variable name. That behavior is just the same as in other languages which have reserved words. It can be summarized that Seed7 reaches the goal of avoiding the misuse of keywords in other ways and not by reserving them altogether.
In a classic compiler (e.g. a Pascal compiler), there is a distinction between reserved words and identifiers. Pascal compilers and possibly Ada, C/C++, Java and C# compilers use an enumeration type to represent the reserved words. Since Seed7 allows user defined statements (which may introduce new keywords) it is not possible to hard code reserved words in the compiler as it is done in Pascal, Ada, C/C++, Java and many other compilers.
How is the syntax of Seed7 defined?
The syntax of Seed7 is described with the Seed7 Structured Syntax Description (S7SSD). The S7SSD is similar to an Extended Backus-Naur Form (EBNF), but there are significant differences. S7SSD does not distinguish between different non-terminal symbols. Instead, it only knows one non-terminal symbol: () . S7SSD syntax rules do not define named non-terminal symbols (EBNF rules define named non-terminal symbols). S7SSD syntax rules are introduced with:
$ syntax
S7SSD syntax rules define a pattern of terminal and non-terminal symbols separated by dots. A S7SSD syntax rule also defines a priority and associativity. The syntax of the + operator is:
$ syntax expr: .(). + .() is -> 7;
The syntax of statements and other constructs is defined as if they were operators:
$ syntax expr: .while.().do.().end.while is -> 25;
S7SSD is a simple syntax description that can be used by humans and compilers respectively interpreters. The syntax of a Seed7 program is defined in the library "syntax.s7i". When a Seed7 program is interpreted or compiled the syntax definitions are read from "syntax.s7i".
Why does Seed7 not use the C statements like C++ and Java?
The C statements have some weaknesses which are avoided with the Seed7 statements:
The C if-statement
if (condition) statement;
allows just one statement after the condition. By using the compound statement, it is possible to have several statements after the condition:
if (condition) { statement1; statement2; }
Adding or removing a statement in the second if-statement is always possible. In the first if-statement you must add braces if you add a statement otherwise you get an undesired effect. Adding statements to an if-statement is quite common.
Since both forms are legal and adding a statement to the first form can lead to errors Seed7 closes this possible source of errors with its if-statement:
if condition then statement end if;
The following switch statement is formally correct but probably wrong
switch (number) { case 1: case 2: result = 5; case 3: case 4: result = 8; break; default: result = 0; }
Forgetting break statements in a switch is another possible source of errors which is avoided with the case-statement of Seed7:
case number of when {1, 2}: result = 5; when {3, 4}: result = 8; otherwise: result = 0; end case;
Why is the type: name declaration syntax used?
Many languages of the ALGOL and C family define variables this way:
int x
If a variable is initialized something like
int x = 888;
is used. Constant declarations often look like this:
const int x = 888;
The Seed7 declaration syntax is oriented towards these examples. Variable declarations are introduced with var and constant declarations are introduced with const. The equals sign is replaced with is and a colon is used as separator between type and name. This leads to:
const integer: x is 42;
Writing the type before the name offers the opportunity to have a type declaration that is easy to recognize and uses the same syntax:
const type: stack is array integer;
It is also easy to recognize function declarations. They use the same syntax:
const func integer: computeSomething is ...
In a block with several declarations the type can be recognized easily:
var integer: index is 1; var integer: paramValue is 0; var bigInteger: bigSum is 0_;
Programmers used to dynamically typed languages may be confused by the colon. Those languages use something like name: type with the type annotation being optional. In Seed7 the specification of the type is mandatory.
What makes code readable?
People often mistake familiarity with a certain kind of syntax for good readability. E.g.: If you prefer statements with braces it is harder to read statements using keywords instead and vice versa. But you can get accustomed to such syntactic things and then they don't hinder readability anymore.
On the other hand, there are things that lead to spaghetti code and being accustomed to the syntax does not help. Generally, the term spaghetti code can be used for code where information is scattered, and the reader needs considerable time to gather this information. E.g.:
- In dynamically typed programming languages, it is often hard to determine the data structures used, since they are not specified. Seed7 variables and types must be declared explicitly. This improves readability.
- Compilers that do type inference spend most of their time for type checking. This means that a human reader also needs to spend most of the time to understand the types in use. That way type inference makes writing easier and reading harder. The omission of type inference in Seed7 simplifies reading code.
Beyond that, reduced complexity also helps readability:
- Integers and strings follow the "one size fits all" principle. There are no integer types of various sizes and there is just one string type.
- Computations with bigInteger values can use the usual infix operators (+ - * div rem mdiv mod).
- There is no need to distinguish between mutable and immutable strings.
- Comparisons are always done with simple operators (= <> < <= > >=).
- There is no NULL, so no necessity to check for NULL.
- There is no inlining of code written in a different language. With inlined foreign code you need to know both languages to understand such a program.
- Code that uses exceptions has better readability than code where the result of every function must be checked for eventual errors.
- There are no implicit casts that may make reading harder.
Isn't the code unreadable if everybody invents new statements?
There are lots of possibilities to write unreadable code without using the extension features of Seed7. The programmer is (as always) responsible to write readable programs. The variable/type/function names and other things chosen by the programmer can always lead to obfuscated code.
Defining new statements and operators is a feature which should not be used in every program by every programmer. It is a feature which allows experienced programmers, to write libraries which use statement or operator syntax instead of function syntax, in areas where such a notation is already accepted practice.
Statements to access a database or operators for vector arithmetic would be such an example. Another example is a construct which can be used in the definition of text adventure games.
The possibility to define statements also allows a more precise language definition. The for/while/if statements of C++ are described in the C++ manuals with BNF and an English description. Seed7 statements can be defined in Seed7. For example:
$ syntax expr: .while.().do.().end.while is -> 25; const proc: while (in func boolean: condition) do (in proc: statement) end while is func begin if condition then statement; while condition do statement; end while; end if; end func;
The syntax and semantic of a while-statement is described using an if-statement and recursion. For performance reasons the implementation will usually use a different approach to implement a while-loop, but this example shows the expressive power of Seed7.
Hasn't Lisp already user defined statements and operators?
Defining the semantic of a new 'statement' in Lisp is a classic example. Normally such 'statements' still use the list notation with lots of parentheses. The read macros of Lisp could be used to define the syntax of a statement, but read macros make no type checks at compile time. Any type checking must be written by the programmer and is not mandated by Lisp. The type checks will be performed at runtime. Depending on the implementation there might be warnings issued at compile time. In general: Lisp 'statement' declarations do not force compile time checks and look less elegant. Seed7 statement declarations force a type check at compile time.
While Lisp allows new and overloaded functions, the Lisp 'operators' are functions which use the prefix notation (with lots of parentheses). Again, read macros could be used to support infix operators with priority and associativity. This read macros would have the same problems as above. Although Lisp fanatics would never admit it, infix operators with priority and associativity are not really supported by Lisp. If somebody tells you that everything can be done in Lisp, send him to the next advocacy group. In general: Seed7 supports user definable infix operators with priority and associativity. Such operators can be overloaded, and the type checks are done at compile time. In Lisp all this would be a hack.
Why does Seed7 use static type checking?
With static type checking all type checks are performed during compile time. Type errors, such as an attempt to divide an integer by a string, can be caught earlier (unless this unusual operation has been defined). The key point is that type errors are found without the need to execute the program. Some type errors can be hidden in rarely executed code paths. Static type checking can find such errors easily. With dynamic type checking extensive tests are necessary to find all type errors. Even tests with 100% code coverage are not enough since the combination of all places where values are created and all places where these values are used must be considered. That means that testing cannot guarantee to find all type errors that a static type checker can find. Additionally, it would be necessary to repeat all tests every time the program is changed. Naturally, there are doubts that enough tests are done and that the tests are adjusted and repeated for every change in the program. Therefore, it can be said that compile time type checks increase the reliability of the program.
Seed7 makes sure that the object values always have the type of the object. This goal is reached with mechanisms like mandatory initialization, runtime checks and the impossibility to change arbitrary places in memory. If the generation of garbage values is avoided, it can be guaranteed that only legal values of the correct type are used as object values. This way runtime type checks are unnecessary, and the program execution can be more efficient.
Type declarations can also serve as a form of documentation because they can illustrate the intent of the programmer. Although static type checking is immensely helpful in finding type errors, it cannot replace a careful program design. Some operations allowed by the static type system can still be wrong because of different measurement units or other reasons. In the end, there are also other possible sources of errors such as range violations.
Interface types can be used if an object can have several types at runtime. In this case the interface type of the object can be determined at compile time and the type of the object value (implementation type) can vary at runtime. The static type checking can still check the interface type and the presence of interface functions. Additionally, the compiler can also check that all functions granted by the interface type are defined for the implementation type.
Is the program development slowed down with static type checking?
No, especially if the time spent to debug a program is taken into account. Except for artificial corner cases all type errors found by a "nitpicking" compiler correspond to runtime type errors that can happen in a dynamically typed language under some circumstances. That way the compile time type checks save the time necessary to find and debug those errors. The time that a compiler needs to find and flag type errors is so small that it can be ignored in this comparison.
Some people claim that adding type information to a program is a time-consuming process. This is only true if the type information is added afterwards, but it is wrong if type considerations take place during the program development. Every good programmer has some concepts about what values will be hold by variables or parameters and what values will be returned by functions. A good type system helps to formalize the type concepts which are already in the mind of the programmer. That way, the ideas of the programmer are documented as well. This type documentation helps when reading the code. The maintenance costs are also reduced, since code is more often read that written.
When comparing compile time and runtime type checking it can be concluded that dynamic typed languages save some programming time by omitting type declarations, but this time must be paid back with massive interest rates to do the debugging and when the code needs to be maintained.
Does static type checking speed up programs?
Definitely yes. Static type checking can guarantee that only legal values of the correct type are used. This way run-time type checks are unnecessary, and the program execution can be more efficient. A statically typed language can be compiled to efficient machine code. In a dynamically typed language, the type checks take place at run-time. It might be necessary to do type checks many times for the same expression and not just once at compile-time. In this case there is an overhead every time a dynamically typed value is used.
Dynamically typed languages often introduce compilation and type annotations as afterthought. This works worse than in a statically typed language. Since type annotations are optional they are not considered when the program is written, but at a later time (maybe by someone else). If the type annotations are not 100% correct unnecessary conversions might take place, which slow down the program.
How can static type checking work if types are first-class objects?
This question refers to something which seems paradox: If Seed7 types are created at runtime, how can they be checked at compile time? The simple answer is that a type created at runtime cannot be used to define something in the program that is currently running.
Seed7 declarations are not executed at runtime. Functions with type parameters and type result are executed at compile time. This is done in templates and abstract data types (both are executed at compile time). It is possible to have type variables and type expressions at runtime but is not possible to declare objects with such a variable type for the program which currently runs. Such type variables and type expressions are used in the Seed7 compiler.
Why does Seed7 not use type inference?
Seed7 has a basic principle that would break if type inference would be used:
The type of every expression (and sub expression) is independent of the context.
To explain this principle consider the expression:
a + b
If the types of a and b are known and the definition of + applies, then the type of the expression a + b is also known. In this example a and b may be constants, variables or even sub-expressions. If one of the types of a or b is not known then the type of a + b cannot be determined. Now assume that a + b is part of a bigger expression:
c = (a + b)
In theory you could deduce the type of a + b if you know the type of c and how = works. But in this case the context of a + b would have to be taken into account. The basic principle mentioned above rules that out. According to it the context around an expression has no influence on the type of the expression. In Seed7 the type information moves inside out from sub-expressions to expressions. Within the syntax tree it moves from the bottom to the top. This rule simplifies type checking a lot. This is one of the reasons why the interpreter can process several hundred thousand lines per second.
For type inference, it would be necessary that type information also moves into the other direction. You can see: This would violate the basic principle mentioned above. As long as this principle holds you need to know the global and local declarations to find out the result type of an expression. With type inference it would be necessary to take other expressions in the local function and even expressions in other functions into account. I do not say that this is not possible (for sure it is an interesting challenge to invent an algorithm to do this). But a human reader would also need to apply this algorithm when reading the program. You must consider that a program is more often read than written.
Why does a local type declaration trigger an error?
A function declaration like:
const proc: main is func local const type: intArrayType is array integer; var intArrayType: arr is [](1, 2); ...
triggers the error:
*** tst249.sd7(6):52: Match for {var intArrayType : {arr } is {[ ] {1 , 2 } } } failed var intArrayType: arr is [](1, 2);
A local declaration block is parsed completely before it is executed. This causes that type declarations inside of a local declaration block are not defined during the parsing. This errors are avoided, if type declarations are made at the top level. E.g.:
const type: intArrayType is array integer; const proc: main is func local var intArrayType: arr is [](1, 2); begin writeln(length(arr)); end func;
Why does a two-dimensional array trigger an error?
A function declaration like:
const func array array string: test (in string: value) is func result var array array string: data is 0 times 0 times ""; ...
triggers the error:
*** tst309.sd7(4):52: Match for {func result var func type:({array type: const func array array string: test (in string: value) is func({array string }) }) : {data } is {0 times {0 times "" } } begin {data := {3 times {2 times value } } } end func } failed
The reason is: The two occurrences of array array string are considered as two different types. This error can be avoided by defining a named type for the two dimensional array at the top level:
const type: stringArray2D is array array string; const func stringArray2D: test (in string: value) is func result var stringArray2D: data is 0 times 0 times ""; ...
Are there automatic casts to the right type?
Seed7 cannot read the mind of the programmer. It is hard to find out what the programmer considers as the "right type". A conversion can lose information. Seemingly safe conversions may also lose information. E.g. Not all 64-bit integer values can be represented as 64-bit float values. It can also lead to unplanned behavior if the programmer is not aware of an automatic conversion. Readability is improved if conversions are done explicitly. Seed7 is strongly typed and uses explicit conversions. E.g.: The conversion from integer to float is done with the functions flt and float. Conversions from float to integer are done with round or trunc. Explicit conversions have more advantages than disadvantages:
- The overloading rules are much simpler.
- An expression can be understood without its calling context.
- Errors caused by unplanned automatic type conversions cannot happen.
- Since you have to do type conversions explicitly you are more aware of the run time overhead.
Is Seed7 a "do what I mean" language?
The phrase "do what I mean" (DWIM) is used when computer systems attempt to anticipate what users intend to do. To put it bluntly: A program tries to read a humans mind. Since this is not possible DWIM languages use heuristics to interpret illegal or ambiguous code towards an interpretation that makes sense. Well, it makes sense for the one who invented the heuristic, which does not imply that it makes sense for everybody else.
Heuristics usually do not work for 100%. There are always corner cases where heuristics fail in an unexpected way. In this case something in your program is misinterpreted and you don't know about it. This means: Your program contains a bug of which you are not aware.
Not all human programmers have the same background. What one programmer sees as correct reinterpretation of his intentions another programmer might consider as stupid. The second programmer would certainly prefer to get an error message instead of a silent reinterpretation of his program. Of course, this silent reinterpretation also means that there is a hidden bug in the program.
There is a reason that natural languages are not used for programming. They are just too ambiguous. Established programming languages try to be unambiguous. Usually, they are stricter than the languages used in mathematics or physics. Being strict has proven to facilitate the maintenance of large programs. It is a bad idea to allow ambiguities, even when they are resolved by "do what I mean" heuristics. Therefore, Seed7 is not a "do what I mean" language.
Can I use something and declare it later?
No, everything must be declared before it is used. The possibility to declare new statements and new operators on one side and the static typing requirements with compile time checks of the parameters on the other side would make the job of analyzing expressions with undeclared functions overly complex.
Forward declarations help, if something needs to be used before it can be declared fully:
const func string: concatenate (inout file: inFile) is forward; const func string: getString (inout file: inFile) is func result var string: stri is ""; begin if inFile.bufferChar = '(' then ignore(getc(inFile)); stri := concatenate(inFile); # Call of forward declared function ignore(getc(inFile)); elsif inFile.bufferChar = '"' then stri := getQuotedText(inFile); end if; end func; const func string: concatenate (inout file: inFile) is func result var string: stri is ""; begin stri := getString(inFile); while inFile.bufferChar = '&' do ignore(getc(inFile)); stri &:= getString(inFile); end while; end func;
Can functions be overloaded?
Yes, functions, operators and statements can be overloaded. E.g.:
const func float: tenPercent (in float: amount) is return amount / 10.0; const func float: tenPercent (in integer: amount) is return float(amount) / 10.0; const func bigRational: tenPercent (in bigInteger: amount) is return amount / 10_;
These functions can be used with:
writeln(tenPercent(123)); writeln(tenPercent(123.0)); writeln(tenPercent(123_));
Existing operators like + can be overloaded with:
const func float: (in integer: summand1) + (in float: summand2) is return float(summand1) + summand2;
Note that the declaration above identifies the + operator with:
(in integer: summand1) + (in float: summand2)
This is the same syntactic pattern as the one used, when the + operator is invoked:
8 + 3.9
This syntactic pattern is defined in the file "syntax.s7i" with the syntax declaration:
$ syntax expr: .() . + .() is -> 7;
The actual syntax is described with:
. () . + . ()
The dots are used to create a list of elements. If we leave out the dots we get the actual syntactic pattern:
() + ()
The place of parameters is specified with (). In declarations () is the place of a parameter declaration and in calls () is the place of an actual parameter.
To introduce new operator symbols like inProduct it is necessary to define the syntax before defining the semantic.
Can I overload two functions which just differ in the result type?
No, return type overloading is not supported. The example below shows what happens if the function f is overloaded with different result types:
const func integer: f (in char: x) is return ord(x); const func string: f (in char: x) is return str(x);
This overloading of the result type triggers compile-time errors:
*** tst508.sd7(4):34: Redeclaration of "f (val char: x)" const func string: f (in char: x) is return str(x); ----------------------------------------------------^ *** tst508.sd7(3):35: Previous declaration of "f (val char: x)" const func integer: f (in char: x) is return ord(x);
Assume return type overloading would be allowed. In this case the expression f('a') is ambiguous as it is unclear which f should be called. With return type overloading the compiler cannot determine the type of f('a') unless it analyzes the context of the call. Thus type checking expressions can become arbitrary complex. Remember that the human reader has to do the same. Omitting return type overloading simplifies type checking a lot. This way the following holds:
The type of every expression (and sub expression) is independent of the context.
Consider this expression with operators (also applying to functions):
a = b + c * d
Converting into a tree representation gives:
= / \ a + / \ b * / \ c d
The type information goes strictly upward (or inside out).
- The * operator might be overloaded for various parameter types.
- The actual types of c and d determine which * operator is used.
- This determines the type of the sub expression c * d.
- The + operator might be overloaded and the parameters b and c * d determine the + operator used and the type of the sub expression b + c * d.
- The type of a and b + c * d determine which = is used and the type of the whole expression.
This bottom up approach simplifies type checking a lot for both the compiler and the human reader. With return type overloading there would be ambiguous sub expressions. For the whole expression these ambiguities might be resolved or they might stay unresolved. It is obvious that return type overloading would complicate type checking considerable (for the human reader too).
Can functions have variable parameter lists?
No, because functions with variable parameter list as the C printf function have some problems:
- Normally type checking is only possible at run time.
- The recognition of overloaded functions becomes more complicated.
Instead Seed7 has array aggregates and allows functions with arrays as parameters. So you could declare a function
const proc: print_list (in array integer: arr) is func local var integer: number is 0; begin for number range arr do writeln(number); end for; end func;
and call it with
print_list([](1, 1, 2, 3, 5, 8, 13, 21, 34, 55));
Why does the initialization use the keyword 'is' instead of ':=' ?
The decision to use the keyword is relates to the Structured Syntax Definition of Seed7. A variable declaration in Seed7 looks like:
var integer: number is 42
The syntax of this variable declaration is defined in the file syntax.s7i. A variable declaration with := would need a different syntax definition. An attempt to define this new syntax leads to an error:
*** tst356.sd7(3):42: ":=" redeclared with infix priority 127 not 20 syntax expr: .var.(). : .(expr). := .(expr) is -> 40; ---------------------------------------------------------^
The error rejects the desired syntax pattern:
var () : () := ()
In syntax patterns () denotes the place for any expression (=parameter). The other symbols in a syntax pattern should appear as expected. The error above highlights a conflict between an existing syntax pattern and the new one. The use of := in this pattern conflicts with the one in the assignment syntax pattern. The assignment operator (:=) is already defined as infix operator with the priority 20. The syntax pattern of the := operator is:
() := ()
In both patterns there is a parameter left of the := symbol (marked in green). In case of the assignment, the priority of the left parameter must be less than 20. In case of the variable declaration, this parameter is situated in between : and :=. Such a middle parameter allows the much weaker priority 127. Code like
var integer : number := 42; begin
would be interpreted as
var integer : (number := 42;) begin
Since the syntax pattern would expect a := after number := 42; this would lead to the error:
*** tst356.sd7(8):47: ":=" expected found "begin" begin -------^
To avoid this misinterpretation, the syntax of a variable declaration with := is rejected beforehand with:
*** tst356.sd7(3):42: ":=" redeclared with infix priority 127 not 20
Languages with hard-coded syntax analysis use a trick to allow the same symbol for assignments and initialization. They read the parameter between : and := with special parsing code to read an identifier. Afterwards, they check if the identifier is followed by an assignment symbol. The Structured Syntax Definition of Seed7 does not support such tricks. Instead it provides a systematic approach for the syntax description. This general concept to define syntax is available to all programmers.
Is there an elegant way to initialize data?
Most languages allow that a constant is initialized with a constant expression. This usually rules out user defined functions (or it is restricted in other ways). Seed7 allows arbitrary expressions (including user defined functions) in initializations of constants and variables:
const integer: limit is 1000 ** 2 * 10; var string: s7Page is getHttp("seed7.sourceforge.net"); const func array string: getWords (in string: fileName) is return split(lower(getf(fileName)), "\n"); var array string: dict is getWords("unixdict.txt"); const set of integer: primes is eratosthenes(limit); const PRIMITIVE_WINDOW: pic is readBmp("head3.bmp"); const array integer: someData is [](1, 1, 2, 3, 5, 8, 13, 21, 34, 55);
A nice example is the initialization of the table stars with the function genStarDescr in the library stars.s7i.
Why is it necessary to initialize all variables?
Forgetting to initialize a variable is a common source of errors. In some programming languages uninitialized variables have a random value which could lead to errors. To avoid errors caused by uninitialized variables in Seed7 each variable must be initialized when it is declared.
Are there types like byte, small and long?
Seed7 follows the "one size fits all" principle for fixed size integers. The type integer is 64-bit signed, smaller integer types do not exist. Today's computers have 64-bit processors. Some processors do not have instructions for all the smaller integer types. On such computers, smaller integers must be converted into larger integers in order to do computations. So programs that use smaller integers might actually be slower because of this. Today computers' memory covers many gigabytes, so the pressure to save memory is also gone. If you prefer arrays with smaller integers, because they fit into the cache, you should probably stick with C or some other lower level language. Seed7 tries to stay above this low level thinking.
Support for shorter integers is only needed, when reading or writing files that contain binary integers of smaller sizes. In C it is possible to write or read data structures directly to or from a file. Such C code is unportable, as it assumes that the file format uses the same endianness (little- or big-endian) as the processor. Seed7 does not support writing or reading structures directly to or from a file. Instead the library bytedata.s7i defines several functions to convert integers into and from signed and unsigned representations of various sizes. These functions also allow that the endianness is specified explicit.
Message digest and compression algorithms do bitwise operations on 32- or 64-bit data. Bitwise operations are not supported by integer. To do that the types bin32 and bin64 have been introduced. These types support bitwise AND, OR and XOR operations, but no integer arithmetic. Hence bin32 and bin64 are not integer types but types that describe bit-patterns with 32 and 64 bits. Conversions between integer and bin32 (respectively bin64) cause no additional costs in compiled programs.
If the 64-bit signed integer type is not sufficient the type bigInteger can be used.
Is Unicode supported?
Seed7 characters and strings support Unicode. Unicode values are encoded with UTF-32. Functions which exchange strings with the operating system automatically convert the strings from and to UTF-32. It is possible to read and write files with Latin-1, UTF-8 and UTF-16 encoding. Functions to deal with code pages and functions to convert between different Unicode encodings are also available.
The usage of UTF-32 for strings in a program has several advantages:
- With UTF-32 it is not necessary to distinguish the normal length of a string from its byte-length. In an UTF-8 or UTF-16 string the number of code points must be computed by processing the whole string. Computing the length of an UTF-32 string does not need such an effort.
- Accessing a code point with an index into an UTF-32 string is simple as well. With UTF-8 and UTF-16 it is necessary to process all code points up to the index. It has been argued, that most strings are processed sequentially. To process UTF-8 strings sequentially multi-byte encodings must be decoded to code points. UTF-32 strings don't need any decoding when code points are accessed. Additionally the processing of UTF-32 strings is not restricted to be sequencial.
- UTF-8 has invalid byte sequences. In UTF-16, single surrogate characters are invalid. A string library, that is based on UTF-8 or UTF-16 must check for valid byte sequences. UTF-32 does not have invalid byte sequences. UTF-32 can hold non-Unicode characters, but this can be used as advantage.
- The overlong encodings of UTF-8 allow several encodings for the same character. According to the standard overlong encodings are not valid UTF-8 representations of the code point. An UTF-8 string library must also consider overlong encodings. UTF-32 and UTF-16 do not have overlong encodings.
- Using a byte-index into an UTF-8 or UTF-16 string triggers a search for the beginning of an UTF-8 or UTF-16 code point. E.g.: If an UTF-8 string is split into two parts a search for the beginning of an UTF-8 byte sequence is necessary. UTF-32 does not need such an effort.
- UTF-32 uses more memory, but today's computers are equipped with a lot of memory. The pressure to save memory is gone and the simple and fast handling of UTF-32 strings outweighs the increased memory usage. Using UTF-32 leads to faster programs than any approach that tries to save memory by examining strings.
- An UTF-32 string can also hold Ascii, Latin-1, UTF-8 or UTF-16 encoded strings. An UTF-32 string can even hold characters from a code page. Of cause it is necessary to know the encoding of such strings. The possibility to hold all this encodings is an advantage that only UTF-32 offers.
Sometimes it is argued that UTF-32 encodes code points and that an actual character might consist of several code points (e.g. by using combining characters). This is not only a problem of UTF-32. Practically all programs that use Unicode assume that a code point is a character. Unicode contains many precomposed characters, so that most of the time a code point is in fact a character. Most of the programs have no problem with that simplification. If a program needs to handle combining characters it must check for that, independent of the code point encoding.
In a Seed7 program all operations with strings can be done with the type string. Having just one string type simplifies things.
Conversions to upper and to lower case use the default Unicode case mapping, where each character is considered in isolation. Characters without case mapping are left unchanged. The mapping is independent from the locale. Individual character case mappings cannot be reversed, because some characters have multiple characters that map to them.
Seed7 source code allows Unicode in char literals, string literals, block comments and line comments. Interpreter and compiler assume that a Seed7 program is written with UTF-8 encoding. Therefore a program editor with UTF-8 encoding should be used.
Unicode names are supported as well. The support for Unicode names is switched off by default and must be activated with the pragma:
$ names unicode;
This pragma allows variables with e.g. German umlauts or Cyrillic letters. This way beginners can use variable and function names from their native language.
Why are strings in Seed7 mutable?
Java, C# and several other languages use immutable strings which allow for simple and quick assignments (just a pointer is assigned). But they also have disadvantages. Almost everything else besides assignments becomes more expensive. Every time immutable strings are changed, the whole string content must be copied. If you want to change a string often, this becomes very costly. For that reason Java introduced the mutable string class StringBuffer (and later StringBuilder). Maintaining the string data of immutable strings is also an overhead that costs time as it requires bookkeeping and garbage collection.
The string handling of mutable strings can be optimized, such that copying the string content can be avoided in many cases. This is done by the Seed7 interpreter and compiler. You get cheap string parameter passing, string slicing and assignment without being bothered with immutable and mutable string types (which is essentially an implementation detail). Mutable strings also give us consistent language semantics (strings are not handled differently than other objects).
Why are strings indexed from one?
Here is a little example to explain that. Please read the second line from the following list:
- You probably forgot that you should just read the second line.
- This is the second line from the list.
- You probably also count your partners starting from zero.
- Next time read the instructions more carefully.
It should be obvious: The number one has been invented as starting point to count something. The first character in this sentence is T not h. So the question is: Why does everybody believe that in computer science the first character has the index 0? Basically this origins in the language C. Arrays and strings in C are viewed as pointer + offset. So it is natural that the first offset is 0. From C this concept spread to many other programming languages. Seed7 breaks with this tradition as it uses the number one again for the purpose it has been invented, thousands of years ago, long before zero has been introduced.
How are comparisons done in Seed7?
In Seed7 the operators = (equal) and <> (not equal) are defined for all types. Additionally many types also define the operators < (less than), <= (less than or equal to), > (greater than) and >= (greater than or equal to). These operators do exactly what the corresponding type considers as the correct comparison.
In Java and other languages you are discouraged to use the normal equality comparison operator (==) for string comparisons. Instead you need to use an expression like name.equals(""). The == operator just compares references, which is almost never the desired operation. Seed7 is much more consistent in this regard, because the = operator is generally used to check for equality. It is just not necessary to tell every newcomer that == is used to compare integers, but that it should never be used to compare strings.
Most types of Seed7 define the function compare(A, B), which returns -1 (if A is less than B), 0 (if A equals B) or 1 (if A is greater than B). This function defines a total order over the values of a type even if < has not been defined or if < does not define a total order. E.g.:
type comparisons compare comment float = <> < <= > >= compare According to IEEE 754 a NaN is neither less than, equal to, nor greater than any value, including itself. Float compare(A, B) considers all NaN values as greater than Infinity. complex = <> compare Compares real and imaginary part. bitset = <> < <= > >= compare The comparisons < <= > >= check for subsets and supersets. Bitset compare(A, B) compares by determining the biggest element that is not present or absent in both sets.
Hash tables use compare(A, B) to manage their elements.
Can Seed7 access databases?
A database library provides a database independent API, which defines how a client may access a database. Seed7 accomplishes database independence by using database drivers as abstraction layers between the application and the database. There are database drivers for MySQL, MariaDB, SQLLite, PostgreSQL, Oracle, Firebird, Interbase, Db2, Informix and SQL Server databases. Databases can also be accessed via the ODBC interface. How the database independent API of Seed7 works can be seen in the following example:
const proc: dbDemo is func local var database: currDb is database.value; var sqlStatement: statement is sqlStatement.value; var integer: index is 0; begin currDb := openDatabase(DB_MYSQL, "testDb", "testUser", "testPassword"); if currDb <> database.value then statement := prepare(currDb, "select * from testTable"); execute(statement); while fetch(statement) do for index range 1 to columnCount(statement) do write(column(statement, index, string) <& ", "); end for; writeln; end while; close(currDb); end if; end func;
In the manual there is a chapter about the database abstraction API.
Are there regular expressions?
Regular expressions are a powerful feature. Unfortunately they also lead to code that is hard to maintain. The regular expression language is usually embedded in a surrounding programming language. Like the format strings of C regular expressions are parsed and processed at run-time. As consequence checking a regular expression at compile-time is not easy. Computing a regular expression usually takes more time than a dedicated function for a specific purpose. There are other difficulties too. Regular expressions work typeless but Seed7 does not. For this reasons regular expressions are currently not supported, but there are alternatives (see below). For simple cases the functions startsWith and endsWith can be used. E.g.:
if endsWith(fileName, ".sd7") then ...
There are variants of replace like
- replace1 - Replace one occurrence
- replaceN - Replace all occurrences including the ones created by replacements
- replace2 - Replace occurrences of search1 followed by search2
Seed7 has support for lexical scanner functions which can be used as replacement for regular expressions in many situations.
What are scanner functions?
Scanner functions read a symbol from a string or file. The symbol read is removed from the beginning of the string respectively file. If the variable stri has the value "12ab" the function
getDigits(stri)
returns "12" and stri has the value "ab" afterwards. The library scanstri.s7i supports scanning from a string. It defines the following functions:
skipComment, getComment, skipClassicComment, skipLineComment, getLineComment, getDigits, getInteger, getNumber, getNonDigits, getQuotedText, getCommandLineWord, getSimpleStringLiteral, getEscapeSequence, getCharLiteral, getStringLiteral, getCStringLiteralText, getName, skipSpace, skipSpaceOrTab, skipWhiteSpace, getWhiteSpace, getWord, skipLine, getLine, getSymbolOrComment, getSymbol, skipXmlComment, getXmlTagOrContent, getXmlCdataContent, getXmlTagHeadOrContent, getSymbolInXmlTag, skipXmlTag, getNextXmlAttribute, getHtmlAttributeValue, getNextHtmlAttribute, getHttpSymbol
The library scanfile.s7i supports scanning from a file. It defines functions similar to the ones defined by scanstri.s7i.
Scanner functions use the LL(1) approach, which is used in compilers. Practically no compiler uses regular expressions to parse a program. The example below uses scanner functions to read a key-value pair from a file:
const proc: getKeyValuePair (inout file: inFile, inout string: propertyName, inout string: propertyValue) is func begin skipWhiteSpace(inFile); propertyName := getName(inFile); skipWhiteSpace(inFile); if inFile.bufferChar = '=' then inFile.bufferChar := getc(inFile); propertyValue := getLine(inFile); else propertyValue := ""; skipLine(inFile); end if; end func;
Scanner functions work strictly from left to right. They examine one character and do decisions based on this character. How scanner functions work is described in the manual.
Why is the div operator used for integer divisions?
In Pascal and Ada the keyword div is used as integer division operator. Other languages like C and its descendants use / for integer division. Using div has some advantages:
- It opens the opportunity to use / for a different purpose. The library rational.s7i defines / to create a rational number.
- An integer division truncates the result. In the common case the result is not equal to that of a floating point division (E.g.: flt(4 div 3) returns 1.0, but flt(4) / flt(3) returns 1.333333). This difference is emphasized by using different operator symbols.
- A negative result of a division can be rounded towards zero or towards minus infinite. Seed7 provides both possibilities with the two integer division operators div and mdiv.
The chapter about the type integer in the manual describes properties of integer divisions and contains tables that show their behavior.
Why are & and <& defined for string concatenation?
The operators & and <& both concatenate strings, but they have different purposes.
The & operator is intended for string concatenations in normal expressions. The & operator does not convert an integer (or some other value) to a string.
The priority of & is defined to execute the concatenation before doing a comparison. E.g.:
name & extension = check
has the meaning
(name & extension) = check
So the & operator can be used like + - * (the expression is evaluated and its result can be compared).
The <& operator is intended for write statements. It is overloaded for many types. As long as the first or the second parameter is a string it does convert the other parameter to a string (with the function str) and does the concatenation afterwards.
The priority of <& is defined to also allow the output of boolean expressions. E.g.:
name <& extension = check
has the meaning
name <& (extension = check)
Note that extension and check could be e.g. integers. The result of 'extension = check' is converted to string with the function str. So
writeln(name <& extension = check)
would write (if name is "asdf: " and extension is not equal to check):
asdf: FALSE
The <& operator can be defined for new types with enable_io respectively enable_output. The description of the Seed7 file API also contains a chapter about the conversion to strings and back.
How is the number format specified when writing a number?
The operator radix converts an integer or bigInteger number to a string using a radix. E.g.:
writeln(48879 radix 16);
The operator RADIX does the same with upper case characters. E.g.:
writeln(3735928559_ RADIX 16);
The operator lpad converts a value to string and pads it with spaces at the left side. E.g.:
writeln(98765 lpad 6);
The operator rpad converts a value to string and pads it with spaces at the right side. E.g.:
writeln(name rpad 20);
The operator digits converts a float to string in decimal fixed point notation. The number is rounded to the specified number of digits. E.g.:
writeln(3.1415 digits 2);
The operator sci converts a float to string in scientific notation. E.g.:
writeln(0.012345 sci 4);
The operator exp is used to specify the number of exponent digits. E.g.:
writeln(1.2468e15 sci 2 exp 1);
All these operators can be combined. E.g:
writeln("decimal: " <& number lpad 10); writeln("hex: " <& number radix 16 lpad 8); writeln("scientific: " <& number sci 4 exp 2 lpad 14);
What types of parameters does Seed7 have?
There are call-by-value and call-by-reference parameters. The formal parameter can be constant or variable. The combination of these features allows four types of parameters:
parameter evaluation strategy access right val call-by-value const ref call-by-reference const in var call-by-value var inout call-by-reference var
For call-by-value parameters (val and in var) the actual parameter value is copied, when the function is called. For call-by-refererence parameters (ref and inout) the function uses a reference to the actual parameter value. Since a call-by-reference parameter is not copied it can provide better performance for structured types like strings, arrays, structs and hashes.
What is an 'in' parameter?
An in parameter describes, that the actual parameter value is going into the function. Inside the function an in parameter cannot be changed. In parameters are the most commonly used evaluation strategy for parameters.
An in parameter is either a val (call-by-value) parameter or a ref (call-by-reference) parameter. Every type defines an in parameter:
- For types with little memory requirements in is a val (call-by-value) parameter:
- For types with bigger memory requirements in is a ref (call-by-reference) parameter:
Usually it is not necessary to care, if an in parameter uses call-by-value or call-by-reference. A programmer can just use in parameters to specify, that the actual parameter value is going into the function. A programmer can use val or ref to overrule this behavior in cases, where the default in parameter specified by a type is not desired.
Is there an example where val and ref parameters have different behavior?
Normally val and ref parameters behave the same. Only in corner cases their behavior differs. This is shown with the following example:
$ include "seed7_05.s7i"; var integer: aGlobal is 1; const proc: aFunc (val integer: valParam, ref integer: refParam) is func begin writeln(valParam <& " " <& refParam); aGlobal := 2; writeln(valParam <& " " <& refParam); end func; const proc: main is func begin aFunc(aGlobal, aGlobal); end func;
The program above writes:
1 1 1 2
The different behavior is triggered when 2 is assigned to the global variable aGlobal:
- The val parameter (valParam) is unaffected by the change of aGlobal, because the actual parameter value was copied when the function was called.
- The ref parameter (refParam) changes when aGlobal is changed.
The effect happens for any type, not just for integer parameters. The same effect also happens, when an additional inout parameter is used instead of a global variable and when the function is called with the same variable as actual parameter for all three parameters.
If a programmer has to deal with such corner cases it is necessary to explicitly use val or ref.
What is call-by-name?
Call-by-name is an evaluation strategy for parameters. The actual call-by-name parameter is not evaluated before the function is called. When the function is executed the call-by-name parameter might be executed once, many times or not at all. Examples of call-by-name parameters are:
- The conditions of while-loops
- The statements in loop bodies
- The statements that are conditionally executed in an if-statement
- The right operand of the boolean operators and and or
As can be seen, call-by-name parameters are used all the time, without realizing it. A call-by-name parameter is a function without parameters. Function types such as proc or func boolean are used as type of formal call-by-name parameters. An expression with the correct type is allowed as actual call-by-name parameter. This actual parameter expression is not evaluated when the function is called. Instead the call-by-name expression is evaluated every time the formal call-by-name parameter is used. A 'conditional' function (similar to the ?: ternary operator) is defined with:
const func integer: conditional (in boolean: condition, ref func integer: trueValue, ref func integer: falseValue) is func result var integer: conditionalResult is 0; begin if condition then conditionalResult := trueValue; else conditionalResult := falseValue; end if; end func;
Seed7 does not require a special notation (like brackets) for actual call-by-name parameters, therefore the 'conditional' function can be called with:
conditional(a >= 0, sqrt(a), a ** 2)
Depending on the condition 'a >= 0' only one of the expressions 'sqrt(a)' and 'a ** 2' is evaluated. This evaluation takes place when 'trueValue' or 'falseValue' is assigned to 'result'.
Why are functions declared with const?
A function declaration like
const func boolean: isZero (in integer: number) is return number = 0;
uses const, because the body of the function
return number = 0;
will not change at run-time. So isZero will not suddenly compute something else. For the same reason, procedures are also defined with const:
const proc: procedureName ...
Are there functions declared without const?
There are function declarations that are not introduced with const. Below is a call-by-name parameter declared with in:
const proc: whileTrueDo (in func boolean: callByNameParam) is func begin while callByNameParam do noop; end while; end func;
The call-by-name parameter callByNameParam refers to the function provided as an actual parameter when whileTrueDo is called:
whileTrueDo(getc(IN) <> '\n');
Inside whileTrueDo, the call-by-name parameter cannot be changed, but depending on the actual parameter, it can refer to different functions in different invocations.
For functions without parameters, there is support to declare functions with var. Except for test programs, this feature is not used, since object orientation provides a much better mechanism to execute different functions at run-time.
What is an integer overflow?
An integer overflow occurs if a calculation produces a result that cannot be stored in an integer variable. E.g.:
1234567890 * 9876543210
The correct result is 12193263111263526900 but this value does not fit into a 64-bit signed integer variable. The lowest 64 bits of the result correspond to -6253480962446024716 which is obviously wrong. Very popular languages such as C, C++, Java, Objective-C and Go do not care about integer overflow. Programs in these languages continue to execute with a wrong value instead of the correct result. This wrong value can then trigger dangerous things. A program can make wrong decisions or produce wrong output, without any hint that an integer overflow occurred. In Seed7 the exception OVERFLOW_ERROR is raised if an overflow occurs. If performance is important the overflow checking can be switched off with the compiler option -so.
Why are integers not promoted to bigInteger when they overflow?
In some languages, an integer expression that overflows is promoted to bigInteger and the correct result is returned as bigInteger. Seed7 does not follow this approach because it costs significant performance.
The approach that promotes integers requires a new encoding for integers. A new combined integer type that can encode a fixed size integer and a big integer in one memory location. Information about the representation that is used must also be encoded in the memory location of the combined integer.
An encoding for a combined integer that is actually used by integer promoting languages is: The lowest bit of a 64-bit value decides if a fixed size integer or a bigInteger is encoded in the remaining 63 bits. If the lowest bit is zero, the higher 63 bits would be the actual integer in a twos complement representation. If the lowest bit is one, the higher 63 bits would encode a reference to the actual bigInteger value. This combined integer encoding (with the lowest bit as decision-maker) considers memory consumption and run-time overhead.
For performance consideration the most common situation is used: An addition of two small integers (that fit into the fixed size part of a combined integer) and no overflow occurs.
- The addition of two normal integers without overflow check takes one machine instruction.
- The addition of two normal integers with overflow check takes two
machine instructions (most hardware architectures support a jump on
overflow instruction):
add jump on overflow
- Code that adds two combined integers (that promote to bigInteger if
the integer addition overflows) takes more instructions. The pseudocode
for the addition of a and b is below:
slow = (a | b) & 1; res = a + b; if (!slow && !overflow_ocurred) { return res; } else { return bignum_add(a, b); }
The happy path requires approximately 5 machine instructions for an addition:| & + !slow !overflow_occurred
Besides that, there is more overhead:- Every time a combined integer is changed by an assignment (or at the end of a variable scope), the old value must be freed. This means, it must be checked if the actual value is a big integer, and if this is the case, the memory of the big integer must be freed.
- Arrays and structs containing combined integers must also do this for all their values.
- Every time a normal integer is needed (e.g. as parameter of an external function) the combined integer must be converted to a normal integer. At least a check of the lowest bit and a shift of the value is necessary to obtain the normal integer.
- If a pointer to a normal integer is needed the combined integer needs to be converted and a conversion back is also needed.
- Multiplications and divisions need an additional shift of the result to obtain the combined integer representation.
The automatic promotion to big integer (with the combined integer type) reduces performance. The approach that only checks for integer overflow needs two instructions for an addition. It needs no additional overhead for assignments, at the end of a variable scope, or in other situations. The possibility of optimizations applies to both approaches that check for integer overflow, but not everything can be optimized away.
For these reasons, Seed7 does not support the automatic promotion of integer expressions that overflow. Instead, it checks for integer overflow and raises the exception OVERFLOW_ERROR if necessary.
Is there a garbage collection?
There is an automatic memory management, but there is no garbage collection process, that interrupts normal processing. There is no situation, where a garbage collection needs to "stop the world". The automatic memory management of Seed7 uses different mechanisms. Memory usage can be categorized and for every category a specific strategy of automatic memory management is used:
- Memory used by local variables and parameters is automatically freed, when a function is left. The interpreter maintains a list of local values and frees them. The compiler inserts code, to free the memory used by local variables, in front of each return statement.
- Memory allocated for intermediate results is freed automatically in a stack like manner. Like an arithmetic expressions such as (1+2)*3+4 can be evaluated with the help of a stack (which stores the intermediate results 3 and 9). For structured values it is possible to maintain a stack of pointers to the values. The interpreter uses a temp flag, which is present in every interpreter object, to free memory. The compiler determines the point, where intermediate results can be be freed, at compile time. Functions, such as the assignment, can abstain from freeing the intermediate result and just assign it to the variable. This way it is not always necessary to copy arbitrary complex values. All this things can be decided by the compiler.
- The memory of strings, bigIntegers, bitsets, arrays, hashes and bstrings is referenced just once. These types do not need reference counting.
- In-parameters for larger data types are always by reference. Reference parameters borrow the reference to memory until the function is left. No copying of memory is necessary and the owner of the actual parameter is in charge of freeing the memory.
- Arrays, hashes and other containers manage their memory. E.g.: When an element is removed from a hash table the memory used by the element is freed as well as the hash table internal data. If the container itself is removed all its elements are removed as well. In Seed7 there are no pointers to array elements, hash keys or hash values. So there is no possibility that pointers become dangling.
- Windows, processes, databases, sql statements and programs use a reference counter to free the data.
- A struct value can be referred by one struct variable and by several interface variables. Struct values use a reference counter to free the struct, if no reference to it exists.
Is Seed7 object oriented?
Yes, but object orientation is organized different compared to other object oriented languages. In a nutshell: It is based on interfaces and allows multiple dispatch. Chapter 7 (Object orientation) of the manual contains a detailed description of the Seed7 object orientation.
An example of an object oriented type is file. A file describes references to values with some other type. A value of a file can have one of the following types: null_file, external_file, echoFile, lineFile, etc. Each of this file value types acts differently to the same requests.
For the type file two kinds of functions are defined:
- Functions which work for all files the same way.
- Dynamic functions which are just an interface. At run time the corresponding function defined for the type of the value is used.
Compared to Java the type file can be seen as interface or abstract class, while the type of the file value can be seen as the class implementing the interface.
Is everything inherited from object?
There can be several base types, each with their own hierarchy. In many object oriented languages the class object is used as element of all container classes. Abstract data types provide a better and type safe solution for containers and other uses of the root class object. Therefore a single rooted hierarchy is not needed.
What is the difference between overloading and object orientation?
Overloading is resolved at compile time while object orientation uses dynamic dispatch which decides at runtime which method should be called. Overloading resolution uses static types to decide. Dynamic dispatch uses the implementation type, which is only known at runtime, to decide. Besides this difference overloading resolution and dynamic dispatch both use the same approach to do the work: The types and the access rights of all parameters are used in the decision process.
What is an abstract data type?
An abstract data type defines, like every other type, a set of functions to handle data. An abstract data type leaves, like an interface type from OO, the details of the data representation open. The difference between the two is:
- An interface type is resolved to an implementation type at runtime.
- An abstract data type is resolved to a concrete type at compile time, when it is used.
Usually an abstract data type uses parameters to resolve to a concrete type. Examples of abstract data types are arrays, structs and hashes. An abstract array type needs the element type as parameter. E.g.:
array string
This array has string elements and uses integer indices. An abstract array, were the index type is also specified as parameters is:
array [char] string
This array has string elements and uses char indices. Arrays are present in many programming languages, but they are usually hard-coded into the compiler / interpreter. Seed7 does not follow this direction. Instead it introduces abstract data types as common concept behind arrays, structs, hashes and other types. Like templates abstract data types are implemented with functions that are executed at compile time. In contrast to templates abstract data types return a type as result.
What is multiple dispatch?
Multiple dispatch means that a function or method is connected to more than one type. The decision which method is called at runtime is done based on more than one of its arguments. The classic object orientation is a special case where a method is connected to one class and the dispatch decision is done based on the type of the 'self' or 'this' parameter. The classic object orientation is a single dispatch system.
In a multiple dispatch system the methods cannot be grouped to one class and it makes no sense to have a 'self' or 'this' parameter. All parameters are taken into account when the dispatch decision is done. In the following example the interface type Number uses multiple dispatch:
const type: Number is sub object interface; const func Number: (in Number: a) + (in Number: b) is DYNAMIC;
The DYNAMIC declaration creates an interface function for the '+' operator. The interface type Number can represent an Integer or a Float:
const type: Integer is new struct var integer: data is 0; end struct; type_implements_interface(Integer, Number); const type: Float is new struct var float: data is 0.0; end struct; type_implements_interface(Float, Number);
The declarations of the converting '+' operators are:
const func Float: (in Integer: a) + (in Float: b) is func result var Float: sum is Float.value; begin sum.data := flt(a.data) + b.data; end func; const func Float: (in Float: a) + (in Integer: b) is func result var Float: sum is Float.value; begin sum.data := a.data + flt(b.data); end func;
The declarations of the normal '+' operators (which do not convert) are:
const func Integer: (in Integer: a) + (in Integer: b) is func result var Integer: sum is Integer.value; begin sum.data := a.data + b.data; end func; const func Float: (in Float: a) + (in Float: b) is func result var Float: sum is Float.value; begin sum.data := a.data + b.data; end func;
The decision which '+' operator should be called at runtime is based on the implementation type (Integer or a Float) of both arguments of the '+'.
What container classes do exist?
Abstract data types are used to replace container classes. When using an abstract data type as container you have to specify the type of the element in the type declaration. Therefore abstract data types are always type safe. Typeless container classes with object elements do not exist. The only thing which comes near to this is the ref_list which is used in the reflection. A ref_list should not be misused as container class. Predefined abstract data types are:
- array
- The type 'array baseType' describes sequences of identical elements of a 'baseType'
- hash
- The type 'hash [keyType] baseType' describes hash tables with elements of 'baseType' which can be accessed using an index of 'keyType'
- set
- The type 'set of baseType' describes a set of elements of a 'baseType'
- struct
- The type 'struct ... end struct' describes all structured types.
Usage examples of abstract data types are:
array string array [boolean] string hash [string] boolean hash [string] array array string set of char set of integer
Are there primitive types?
As in C++, Java, C# and other hybrid object oriented languages there are predefined primitive types in Seed7. These are integer, char, boolean, string, float, rational, time, duration and others. In addition to the predefined primitive types, there is also the possibility to declare new primitive types.
What is the difference between object and primitive types?
Variables with object types contain references to object values. This means that after
a := b
the variable 'a' refers to the same object as variable 'b'. Therefore changes of the object value that 'a' refers to, will effect variable 'b' as well (and vice versa) because both variables refer to the same object.
For primitive types a different logic is used. Variables with primitive types contain the value itself. This means that after
a := b
both variables are still distinct and changing one variable has no effect on the other.
If 'a' and 'b' are declared to have type 'aType' which contains the integer field 'property' you can do the following:
b.property := 1; a := b; b.property := 2;
Everything boils down to the question: What value does 'a.property' have now.
- If 'aType' is an object type a.property has the value 2 because 'a' and 'b' both refer to the same object.
- If 'aType' is a primitive type a.property has still the value 1 because 'a' and 'b' are distinct objects.
When to use an object type and when a primitive type?
You should declare a new primitive type if you don't need the object oriented paradigm that a variable (and a constant) is just a reference to the object. Another indication is: If you don't need two concepts of what is equal (An == operator and an equal method).
How does the assignment work?
For object types just the reference to the object value is copied. For primitive types the value itself is copied. Since values can be very big (think of arrays of structs with string elements) value copies can be time consuming.
In pure object oriented languages the effect of independent objects after the assignment is reached in a different way: Every change to an object creates a new object and therefore the time consuming copy takes place with every change. Because usually changes to an object are more frequent than assignments this approach can be even more time consuming than the approach using value copies for the assignment.
Why are there two forms of assignment?
Seed7 has an approach for the assignment where practical arguments count more than the classic object oriented principles. In Seed7 every type has its own logic for the assignment where sometimes a value copy and sometimes a reference copy is the right thing to do. Exactly speaking there are many forms of assignment since every type can define its own assignment. If a value copy works like a deep or a shallow copy, it can also be defined depending on the type.
For example: For integer, char and string variables a value copy is what most people expect. For files you don't expect the whole file to be copied with an assignment, therefore a reference copy seems appropriate.
And by the way: Although it is always stated that in object oriented languages everything is done with methods, this is just not true. Besides statements and operators in C++ and Java which are special even Smalltalk treats the assignment and the comparison special. Seed7 does not have such special treatment for the assignment and the comparison operators.
Where are the constructors?
Seed7 does not need constructors, but you can define normal functions which create a new value in a similar way as constructors do it.
Seed7 uses a special create statement ( ::= ) to initialize objects. Explicit calls of the create statement are not needed.
The lifetime of an object goes like this:
- Memory is reserved for the new object (stack or heap memory make no difference here).
- The content of the new memory is undefined (It may contain garbage), therefore a create statement is necessary instead of an assignment.
- The create statements copies the right expression to the left expression taking into account that the left expression is undefined.
- If the object is variable other values can be assigned using the assign statement ( := ). The assignment can assume that the left expression contains a legal value. This allows that for strings (and some other types which are just references to a memory area) the memory containing the old string value (and not the memory of the object itself) can be freed if necessary.
- At the end of the lifetime of an object the destroy statement is executed. For strings (and some other types which are just references to a memory area) the memory containing the string value (and not the memory of the object itself) is freed.
- The memory of the object is freed.
The first three steps are usually hidden in the declaration statement.
Are there static methods / class methods?
Seed7 allows defining functions (procedures and statements) without corresponding class. If this is not desired Seed7 uses a special parameter, the 'attr' (attribute) parameter, to archive the functionality of static methods (elsewhere named class methods) in a more general way. How a static method is declared is shown in the following example:
const func integer: convert_to (attr integer, in char: ch) is func result var integer: converted is 0; begin converted := ord(ch); end func;
The function 'convert_to' can be called as
number := convert_to(integer, 'a');
Since the result of a function is not used to determine an overloaded function, this is sometimes the only way to use the same function name for different purposes as in:
ch := convert_to(char, 1); stri := convert_to(string, 1); ok := convert_to(boolean, 1); num := convert_to(typeof(num), 1);
Attribute parameters allow a function to be attached to a certain type. But this concept is much more flexible than static methods (or class methods). A function can also have several 'attr' parameters and 'attr' parameters can be at any parameter position (not just the first parameter). Furthermore the type can be the result of a function as for example typeof(num).
Are there generics / templates?
The generics (templates) of Ada, C++ and Java use special syntax. In Seed7 you get this functionality for free without special syntax or other magic.
Generally all Seed7 functions can be executed at compile time or at runtime. The time of the function execution depends on the place of the call. Declarations are just a form of statement and statements are a form of expression. A Seed7 program consists of a sequence of declarations (expressions), which are executed one by one at compile time. This expressions can also invoke user defined functions.
A function body can contain declaration statements. When such a function is executed at compile time, it defines things that are part of the program. It is an error to execute such a function at runtime.
Seed7 uses the word template to describe a function which is executed at compile time and declares some things while executing (at compile time). Naturally a template function can have parameters. Especially types as parameters are useful with template functions. That way a template function can declare objects with the type value of a parameter.
It is necessary to call template functions explicit. They are not invoked implicit as the C++ template functions. The explicit calls of template functions make it obvious what it is going on. This way the program is easier to read.
Is the parser part of the run-time library?
Yes, the library progs.s7i defines the type program, which describes a Seed7 program. The functions parseFile, and parseStri can be used to parse a file respectively string. The function execute can be used to execute a program. E.g.:
$ include "seed7_05.s7i"; include "progs.s7i"; const proc: main is func local var program: aProg is program.value; begin if length(argv(PROGRAM)) >= 1 then aProg := parseFile(argv(PROGRAM)[1]); if aProg <> program.value then execute(aProg, argv(PROGRAM)[2 ..]); end if; end if; end func;
Can I access the abstract syntax tree (AST)?
Yes, but you cannot access the AST of the program that currently runs. Instead you can parse a program and access its AST. The functions parseFile, and parseStri return a program object. The type program provides access to an enriched AST, the call-code. You can get the list of globally declared objects as ref_list. A ref_list is a list of references to objects. The type reference describes a reference to an object. The program below writes the names of all global objects in the program panic.sd7:
$ include "seed7_05.s7i"; include "progs.s7i"; const proc: main is func local var program: aProg is program.value; var reference: aRef is NIL; begin aProg := parseFile("panic.sd7"); if aProg <> program.value then for aRef range globalObjects(aProg) do writeln(str(aRef)); end for; end if; end func;
What restrictions does Seed7 have?
Historic compilers used fixed size memory areas to store the data of the compiled program. Limitations like source line length, identifier length, string length or number of nesting levels can be found in language manuals. If you reach such a limit an otherwise correct program will not compile. In Seed7 restrictions of other languages have been removed:
- There is no limitation for the length of an identifier and all characters of an identifier are significant.
- Statements and parentheses can be nested without limitation in depth.
- The number of parameters and local variables is not limited.
- Strings can contain any characters (also the NUL character). This allows holding binary information in strings.
- Although strings are not NUL terminated they have no size limitation. (Except when memory is exceeded)
- String literals can have any length.
- There is no limitation in the length of a source line.
- There is no level limitation for nesting includes.
What does the term undefined behavior mean?
Undefined behavior is a term used in the language specification of C and in other programming languages. Undefined behavior usually means that the behavior of the program is unpredictable. In C dividing by zero, accessing an array out of bounds, dereferencing NULL or a signed integer overflow all triggers undefined behavior. Seed7 has a well defined behavior in all situations. Even in situations where the language specification of C would refer to undefined behavior.
What does the term memory safety mean?
Memory safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access. This means that in all possible executions of a program, there is no access to invalid memory. The violations include:
- buffer overflow
- buffer over-read
- use after free
- null pointer dereference
- using uninitialized memory
- double free
In Seed7 there is no possibility to access memory outside of the defined datatypes. For all accesses to containers like array and string the indices are checked to be inside the allowed range. In Seed7 there are no pointers that can access arbitrary memory areas. All computations of memory sizes are protected against integer overflow.
Are there exceptions?
Yes, Seed7 has exceptions which are similar to Ada exceptions. In chapter 16.3 (Exceptions) of the manual you will find a detailed description of the Seed7 exceptions. The use of exceptions also improves readability. E.g.:
doA(); doB(); doC();
In this example the normal flow of control can be seen easily. If doA(), doB() or doC() trigger an exception the program is terminated. The program is safe without the need to do something.
Let's assume that exceptions are not supported and that the functions doA(), doB() and doC() will return error codes. In C you can ignore function results, so this would be legal C code. But in this case the code is unsafe since the error codes get ignored. In a language without exceptions, it is necessary to change the code to check for errors. E.g.:
if (doA() == ERRORVALUE_A) { ... handling of errors triggered by doA() ... } else if (doB() == ERRORVALUE_B) { ... handling of errors triggered by doB() ... } else if ((errorVar = doC()) == ERROR_X || errorVar == ERROR_Y) { ... handling of errors triggered by doC() ... } else { ... code that follows doC() ... }
This can lead to horrible code where it is easy to overlook a bug.
What happens if an exception is not caught?
If an exception is not caught the program is terminated and the s7 interpreter writes a stack trace:
*** Uncaught exception NUMERIC_ERROR raised with {integer: <SYMBOLOBJECT> *NULL_ENTITY_OBJECT* div fuel_max } Stack: in (val integer: dividend) div (val integer: divisor) at integer.s7i(95) in init_display at lander.sd7(840) in setup at lander.sd7(909) in main at lander.sd7(1541)
This stack trace shows that a div operation causes a NUMERIC_ERROR (probably a division by zero) in line 840 of the file lander.sd7. A short examination in lander.sd7 shows that an assignment to 'fuel_max' was commented out to show how stack traces work.
A compiled program creates a much shorter crash message:
*** Uncaught exception NUMERIC_ERROR raised at tmp_lander.c(764)
In this case the mentioned file name and line number refers to the temporary C file or the Seed7 runtime library. To get useful information there are two possibilities:
- Start the program in the interpreter instead.
- Compile the program with the options -g -e and start it from a debugger.
If s7c is called with the option -g it instructs the C compiler to generate debugging information. This way a debugger like gdb can run the program and provide information. The option -e tells the compiler to generate code which sends a signal, if an uncaught exception occurs. This option allows debuggers to handle uncaught Seed7 exceptions. Note that -e sends the signal SIGFPE. This is done even if the exception is not related to floating point operations.
Chapter 16.5 (Stack trace) of the manual contains a detailed description how to debug compiled Seed7 programs.
Why does a write statement raise RANGE_ERROR?
Writing an Unicode character beyond '\255;' to the standard output file (STD_OUT) raises RANGE_ERROR. The file STD_OUT only supports Latin-1 or Ascii characters <= '\255;'. This can be avoided by using STD_CONSOLE instead of STD_OUT. You need to include the library console.s7i with
include "console.s7i";
and assign STD_CONSOLE to OUT at the beginning of the main function
const proc: main is func ... begin OUT := STD_CONSOLE;
This way write will accept any Unicode character.
Writing Unicode characters beyond '\255;' to a file opened with open also raises RANGE_ERROR. This can be avoided by opening the file with openUtf8.
Is there a return statement?
There is no return statement in Seed7. Instead there is a return construct that can be used to declare a function:
const func boolean: flipCoin is return rand(FALSE, TRUE);
This is a shortcut for a function declaration with a result variable:
const func boolean: flipCoin is func result var boolean: coinState is FALSE; begin coinState := rand(FALSE, TRUE); end func;
The return above is not a statement but an alternate possibility to declare a function. This alternate function declaration differs from the normal function declaration that uses:
func ... end func;
Using return as statement triggers an error.
const func boolean: flipCoin is func result var boolean: coinState is FALSE; begin return rand(FALSE, TRUE); # THIS WILL NOT WORK end func;
The return construct of Seed7 is not comparable to the return statements of other programming languages. Since no return statement exists, it is not possible to leave a function from the middle of a loop (except when exceptions are used).
Why are break and continue not supported?
Just like goto statements, break and continue violate the concept of structured programming. A programmer should imagine loops as:
while primaryCondition and loopCount <= 1000 and seconds <= 3600 do if data <> "undefined" and not doSkip then do_something_useful; end if; end while;
instead of:
while primaryCondition do if loopCount > 1000 then break; if seconds > 3600 then break; if someData = "undefined" then continue; if doSkip then continue; do_something_useful; end while;
A goto or anything like it compromises readability. Non-structured statements are frequently used as shortcuts to avoid restructuring the program's flow. Programmers should overcome the temptation to introduce break or continue. Usually, this is a sign that the code is too complex and should be refactored. Seed7 provides many loops that help in this regard.
Seed7 can define the syntax and semantics of all structured statements easily. E.g.: It is possible to define a structured loop statement with an exit in the middle. This can be used with:
loop ch := getc(inFile); until ch = '\n' do stri &:= ch; end loop;
Instead of insisting on break and continue, it would make sense to propose structured statements that can replace them.
The break and continue statements are often seen as a trick to get more performance at the expense of readability. These performance wins are questionable. Given todays optimization techniques, the compiler might generate the same machine code from clean source code without break and continue.
For the reasons stated above, and to promote structured programming, break, continue and goto are not supported.
How to define break and continue?
The context of break and continue determines what they do. In this regard, they are not statements on their own but part of a surrounding statement. The surrounding statement defines what break and continue do. Every time a programmer defines a new statement, it would be necessary to specify the behavior of break and continue. This could be done by specifying labels (which do not exist in Seed7):
const proc: loop (in proc: statements) end loop is func begin repeat continueLabel: statements; until FALSE; breakLabel: end func;
So labels and goto would be needed to introduce break and continue. Beyond that, all user defined loops would need to consider them.
Since break and continue are not structured statements, there is no straightforward way to implement them. However, they could be implemented with exceptions. These exceptions must be caught by the loop statement. In the example below, a special loop statement is introduced to catch these exceptions:
syntax expr: .loop.().end.loop is -> 25; const EXCEPTION: DO_BREAK is enumlit; const EXCEPTION: DO_CONTINUE is enumlit; const proc: break is return raise DO_BREAK; const proc: continue is return raise DO_CONTINUE; const proc: loop (in proc: statements) end loop is func local var boolean: exitLoop is FALSE; begin repeat block statements; exception catch DO_BREAK: exitLoop := TRUE; catch DO_CONTINUE: noop; end block; until exitLoop; end func;
How is Seed7 parsed?
The scanner (tokenizer) uses simple hard coded rules to read tokens. Whitespace and comments are skipped by the scanner and identifiers are looked up in a table of defined symbols.
Based on the scanner the syntax analysis uses a recursive descent LL(1) parser. This means that a lookup of one symbol is used to do syntactic decisions. The rules for parsing parentheses, call expressions and dot expressions are hard coded. For all other expressions the recursive descent parser is data driven. The data which drives the parser is actually a syntax description tree. Syntax descriptions like
$ syntax expr: .while.().do.().end.while is -> 25;
are used to create the syntax description tree. The result of the syntax analysis is an abstract syntax tree (AST).
The AST is processed again to add semantic information. All things of the program that have been defined and that are currently available are maintained in a dictionary. For overloaded functions and statements this dictionary has the form of a tree. The expressions from the AST are matched with the dictionary. If a match fails, because a corresponding declaration is not found, you will get an error like:
*** chkloop.sd7(35):52: Match for {while "X" do {1 + 2 } end while } failed
If all expressions are found in the dictionary the matching process leads to an enriched AST, the call-code. Call-code can be executed by the interpreter. Alternatively the compiler can generate C code from it.
What is link time optimization?
Traditionally C source files are compiled separately into object files. These object files are later linked together into one executable file. Optimizations regarding two object files cannot be done. Link time optimization (LTO) also allows these optimizations. Gcc and clang support LTO, by writing their intermediate representations to the object files. This way interprocedural optimizations can be done when the object files are linked. C compiler, linker and archiver need to support LTO.
When Seed7 is compiled the program chkccomp.c checks if all involved components (C compiler, linker and archiver) support LTO. Currently this check is only done for gcc and clang.
The Seed7 compiler supports the option -flto, which triggers the necessary steps to do LTO. If LTO is not supported the option -flto has no effect.
Can Seed7 compile to a dll/so?
Basically compiling to a dll/so would be possible. But there are obstacles. Regarding libraries, the approach used by Seed7 and other languages differs considerably. Seed7 libraries can be included directly, while the usual dll/so needs an additional header file which desribes the interfaces of the dll/so. The Seed7 overloading does not use name mangling and a possible name mangling of Seed7 would be incompatible to the ones used by other languages. Fundamental types like string, bigInteger, array, hash and file could not be used directly from other languages. When a Seed7 dll/so is only used by Seed7, there would still be the name mangling and the header file issue. Compiling to a dll/so is not high on the priority list, but if someone implements it, it will be merged.
Where does the interpreter look for include libraries?
Include libraries with absolute path (an absolute path starts with a forward slash) are only searched at the specified place. All other include libraries are searched in several directories. This is done according to a list of library directories (a library search path). The directories of the list are checked one after another for the requested include file. As soon as the include file is found the search is stopped and the file is included. The following directories are in the list of library directories:
- The directory of the interpreted program. E.g.: When the program
/home/abc/test/pairs.sd7 is interpreted the directory/home/abc/test is in the list of library directories. - The directories that are specified at the command-line with the option -l.
- The directory containing the predefined Seed7 include libraries. This
directory is hard-coded in the interpreter (an absolute path like
/directory_where_Seed7_was_installed/seed7/lib ). The hard-coded library directory is determined when the interpreter is compiled. When the interpreter was not compiled from source (binary release) the path../lib relative to the current working directory is used. - The directory specified with the SEED7_LIBRARY environment variable.
- Directories specified in the source file with the library pragma.
E.g.: The line: $ library "/home/abc/seed7/lib" adds the directory
/home/abc/seed7/lib to the list of library directories.
Seed7 interpreter and compiler (s7c) use the same list of library
directories (the same library search path). When Seed7 is compiled from
source code both (interpreter and compiler) will find the Seed7 include
files automatically. Interpreter and compiler from the binary release will
only find library include files when the path
How is the directory of the predefined include libraries determined?
The directory of the predefined include libraries is hard-coded in the interpreter.
This information is determined when the Seed7 interpreter is compiled. The command
make depend writes a line, which defines the C preprocessor variable
SEED7_LIBRARY, to the file
#define SEED7_LIBRARY "/home/abc/seed7/lib"
The preprocessor macro SEED7_LIBRARY is used by the function init_lib_path(), which
is defined in
Interpreter and compiler use the same strategy to determine the directory with predefined include libraries.
What happens during make depend?
The instructions how to compile the interpreter state that you need a makefile that is specific for your combination of operating system, make utility, C compiler and shell. When you use the command
make depend
your specific makefile writes three configuration files:
config file included by copied to chkccomp.h chkccomp.c base.h chkccomp.c version.h settings.h version.h
These files contain C preprocessor macros with configuration values that are specific for the OS and the C compiler. The command make depend also compiles chkccomp.c. This program includes base.h and chkccomp.h. After the compilation chkccomp is executed with:
./chkccomp version.h "S7_LIB_DIR=$(S7_LIB_DIR)" "SEED7_LIBRARY=$(SEED7_LIBRARY) "LINK_TIME=$(LINK_TIME)"
When chkccomp runs it copies the files base.h and settings.h to version.h. Then it tests the properties of the OS and the C compiler with various small test programs. The results of these tests are also written to version.h.
Afterwards chkccomp appends further information to version.h (which includes the absolute path to the seed7 directory). The environment variables S7_LIB_DIR and SEED7_LIBRARY allow the specification of the final path to the Seed7 directory. If the seed7 directory will not move afterwards these variables can be left empty. The environment variables S7_LIB_DIR and SEED7_LIBRARY can be specified when make depend is invoked. E.g.:
make S7_LIB_DIR=<lib-dir-path> SEED7_LIBRARY=<library-path> LINK_TIME=<value> depend
Each of the options S7_LIB_DIR, SEED7_LIBRARY and LINK_TIME can be ommited. The following environment variables are defined by chkccomp:
Variable Comment S7_LIB_DIR Absolute path to directory with static libraries SEED7_LIBRARY Absolute path of directory with *.s7i libraries CC_ENVIRONMENT_INI Name of environment ini file in the S7_LIB_DIR LINK_TIME Link libraries at 'BUILD' time or at 'RUN' time
When a package is created the option LINK_TIME can be set to BUILD. This means that all libraries must be present when an executable is invoked. In this case all the linked libraries must be added to the dependencies of the package. The default of the option LINK_TIME is RUN. This means that the executables do not depend on certain libraries. The libraries are searched for at run-time, when they are needed. This allows that a library can be added later and Seed7 can use it then.
How does the Seed7 compiler get information about C compiler and runtime?
The Seed7 compiler needs detailed information about the C compiler and its runtime library. This information is created when the Seed7 interpreter is compiled. The command make depend compiles and executes the program "chkccomp.c", which writes configuration values as C preprocessor macros to version.h. E.g.:
#define CC_SOURCE_UTF8 #define SEED7_LIB "seed7_05.a" #define DRAW_LIB "s7_draw.a" #define CONSOLE_LIB "s7_con.a" #define DATABASE_LIB "s7_db.a" #define COMP_DATA_LIB "s7_data.a" #define COMPILER_LIB "s7_comp.a" #define S7_LIB_DIR "/home/abc/seed7/bin"
Many of the preprocessor macros of "version.h" are determined with test programs. E.g.:
#define RSHIFT_DOES_SIGN_EXTEND 1 #define TWOS_COMPLEMENT_INTTYPE 1 #define ONES_COMPLEMENT_INTTYPE 0 #define LITTLE_ENDIAN_INTTYPE 1
The preprocessor macros used by
The Seed7 compiler uses the runtime libraries SEED7_LIB, CONSOLE_LIB, DRAW_LIB,
COMP_DATA_LIB and COMPILER_LIB in the directory S7_LIB_DIR when it links object
files to an executable. Config values like RSHIFT_DOES_SIGN_EXTEND,
TWOS_COMPLEMENT_INTTYPE and LITTLE_ENDIAN_INTTYPE are used to control the kind
of C code produced by the Seed7 compiler. The library cc_conf.s7i also
provides access to config values that do not come from "version.h", but are
defined in
#define WITH_STRI_CAPACITY 1 #define ALLOW_STRITYPE_SLICES 1
This configuration values describe data structures and implementation strategies used by the Seed7 runtime library. They do not depend on the C compiler and its runtime library, but they may change between releases of Seed7.
What should a binary Seed7 package install?
A binary Seed7 package needs to install four groups of files:
- The executables of the interpreter (s7 from
seed7/bin ) and the compiler (s7c fromseed7/bin orseed7/prg ). - The Seed7 include libraries (files from
seed7/lib with the extension .s7i). - The static Seed7 object libraries (the files
seed7_05.a ,s7_draw.a ,s7_con.a ,s7_db.a ,s7_data.a ands7_comp.a fromseed7/bin ). - Documentation files (the files COPYING and LGPL and all files from
seed7/doc ).
The table below shows the suggested directories for Linux/Unix/BSD:
Directory Macro Group of files /usr/bin - Executables (s7 + s7c) /usr/lib64/seed7/lib SEED7_LIBRARY Seed7 include libraries /usr/lib64/seed7/bin S7_LIB_DIR Static libraries
The macros must be defined, when the interpreter is compiled. This can be done by calling make depend with:
make S7_LIB_DIR=/usr/lib64/seed7/bin SEED7_LIBRARY=/usr/lib64/seed7/lib depend
Afterwards the interpreter can be compiled with 'make' and the Seed7 compiler can be compiled with 'make s7c'. This three make commands can be combined to
make S7_LIB_DIR=/usr/lib64/seed7/bin SEED7_LIBRARY=/usr/lib64/seed7/lib depend s7 s7c
Alternatively the Seed7 compiler can be compiled as post-install step.
This requires that
s7 s7c -O2 s7c
It is also possible to compile the Seed7 compiler in the build directory. In this case it is necessary to specify the directories SEED7_LIBRARY and S7_LIB_DIR with the options -l and -b:
./s7 -l ../lib s7c -l ../lib -b ../bin -O2 s7c
Compiling s7c with a make command should be preferred.
What is necessary to compile Seed7 with database connections?
The Seed7 runtime library provides the possibility to connect to several databases. During the compilation of the Seed7 interpreter the program "chkccomp.c" searches for the availability of database connector libraries and the corresponding database include files (*.h header files). The connector libraries are provided by the database and can be static or dynamic. Often a connector library also provides a database include file (column DB include below). If the database include file is missing Seed7 uses its own database include file (the one from the column Other *.h below). The names of the connector libraries can be specified in the makefile (macro definitions can be written to "chkccomp.h"). The names of the macros for the connector library names are provided in the columns Static lib macro and Dynamic lib macro below. The list below lists the currently supported databases:
Database DB include Other *.h DB driver Static lib macro Dynamic lib macro MySQL mysql.h db_my.h sql_my.c MYSQL_LIBS MYSQL_DLL MariaDB mysql.h db_my.h sql_my.c MYSQL_LIBS MYSQL_DLL SQLLite sqlite3.h db_lite.h sql_lite.c SQLITE_LIBS SQLITE_DLL PostgreSQL libpq-fe.h db_post.h sql_post.c POSTGRESQL_LIBS POSTGRESQL_DLL Oracle oci.h db_oci.h sql_oci.c OCI_LIBS OCI_DLL Firebird ibase.h db_fire.h sql_fire.c FIRE_LIBS FIRE_DLL Interbase ibase.h db_fire.h sql_fire.c FIRE_LIBS FIRE_DLL DB2 sqlcli1.h db_odbc.h sql_db2.c DB2_LIBS DB2_DLL Informix infxcli.h db_odbc.h sql_ifx.c INFORMIX_LIBS INFORMIX_DLL SQL Server sql.h db_odbc.h sql_srv.c SQL_SERVER_LIBS SQL_SERVER_DLL ODBC sql.h db_odbc.h sql_odbc.c ODBC_LIBS ODBC_DLL TDS sybdb.h db_tds.h sql_tds.c TDS_DLL
If no static library is provided in the makefile (by writing it to "chkccomp.h") a default value is used by "chkccomp.c". This default value differs between Linux, macOS and Windows:
Static lib macro Linux connector lib macOS connector lib Windows connector lib MYSQL_LIBS -lmysqlclient -lmysqlclient mariadbclient.lib or mysqlclient.lib SQLITE_LIBS -lsqlite3 -lsqlite3 sqlite3.lib POSTGRESQL_LIBS -lpq -lpq libpq.lib OCI_LIBS -lclntsh -lclntsh FIRE_LIBS -lfbclient -lfbclient fbclient.dll or gds32.dll DB2_LIBS libdb2.a libdb2.a db2cli.lib INFORMIX_LIBS iclit09b.a iclit09b.a iclit09b.lib SQL_SERVER_LIBS ODBC_LIBS -lodbc -liodbc -lodbc32 or odbc32.lib
If no dynamic library is provided in the makefile (by writing it to "chkccomp.h") a default value is used by "chkccomp.c". This default value differs between Linux, macOS and Windows:
Dynamic lib macro Linux connector lib macOS connector lib Windows connector lib MYSQL_DLL libmysqlclient.so libmysqlclient.dylib libmariadb.dll or libmysql.dll SQLITE_DLL libsqlite3.so libsqlite3.dylib sqlite3.dll POSTGRESQL_DLL libpq.so or libpq.so.5 libpq.dylib libpq.dll OCI_DLL libclntsh.so libclntsh.dylib oci.dll FIRE_DLL libfbclient.so libfbclient.dylib fbclient.dll or gds32.dll DB2_DLL libdb2.so libdb2.dylib db2cli.dll INFORMIX_DLL iclit09b.so iclit09b.dylib iclit09b.dll SQL_SERVER_DLL libtdsodbc.so libtdsodbc.dylib sqlsrv32.dll ODBC_DLL libodbc.so libiodbc.dylib odbc32.dll TDS_DLL libsybdb.so libsybdb.dylib sybdb.dll
For Oracle it is assumed that the environment variable ORACLE_HOME has been set. Static libraries are preferred over dynamic libraries. When no connector library can be found a dynamic library is expected. This way the database can be connected if a dynamic database connector library is installed later.
For a Seed7 package this means: During the compilation of Seed7 the development packages of all supported databases should be installed. This way the original headers are used instead of the headers provided by Seed7. When dynamic database connector libraries are used the Seed7 package must require this packages.
Depending on the configuration the database connector library is linked statically or dynamically. If a dynamic database connector library cannot be found at runtime the function openDatabase raises the exception DATABASE_ERROR.
How to fix the error "Searching dynamic libraries failed"?
Opening a database might trigger the error
Database error: Searching dynamic libraries failed: someName
where someName is the name of a DLL or shared library that could not be loaded. This can have different reasons:
- The library is neither in the directories that the operating system uses for DLLs / shared libraries nor in the directory of the executable (e.g. the directory of s7 respectively s7.exe). Each operating system has dedicated directories for DLLs / shared libraries. These can be found in the operating system documentation.
- The library is 64-bit and the program (s7 respectively s7.exe) is 32-bit or vice versa. In this case you need to make sure that both are either 32-bit or 64-bit.
- The library depends on some other library and fails to load this other library. In this case it might be necessary to change the source code to load this other library in advance.
To determine if s7 is 32-bit or 64-bit execute in the seed7/prg directory:
s7 confval
and look what it writes for POINTER_SIZE.
When Seed7 is compiled the command make depend logs information about the DLLs / shared libraries used. A log line like
SQLite: DLL / Shared library: libsqlite3.so (present)
indicates that it is possible to dynamically link libsqlite3.so. The phrases (not present) and (cannot load) indicate that loading the DLL / shared library at build-time failed. This alone is not a problem since at run-time a list of possible DLLs / shared libraries is processed. Aside from that a DLL / shared library might be installed later after Seed7 has been compiled.
For DLLs / shared libraries with an absolute path further information like (32-bit) or (64-bit) is written:
SQLite: DLL / Shared library: C:/sqlite/sqlite3.dll (cannot load) (32-bit)
The command make depend writes also the macros MYSQL_DLL, SQLITE_DLL, POSTGRESQL_DLL, ODBC_DLL, OCI_DLL, FIRE_DLL, DB2_DLL, INFORMIX_DLL, SQL_SERVER_DLL and TDS_DLL to the file src/version.h. These macros contain lists of DLLs / shared libraries with absolute paths or just DLL / shared library names. If a library is needed at run-time the list of the corresponding macro is processed until loading a library succeeds. If all attempts to load a library fail you get the error: "Searching dynamic libraries failed".
The PostgreSQL library (defined with POSTGRESQL_DLL) might depend on other libraries. If this is the case the libraries defined by the macros LIBINTL_DLL, LIBEAY32_DLL, LIBCRYPTO_DLL and LIBSSL_DLL are loaded (if these macros are defined in src/version.h) before loading POSTGRESQL_DLL.
Does the interpreter use bytecode?
No, the analyze phase of the Seed7 interpreter produces call-code which consists of values and function calls. This call-code is just handled in memory and never written to a file. After the analyze phase the call-code is interpreted.
How does the analyze phase of the interpreter work?
The analyzer reads successive expressions. The expressions are read with a table-driven LL(1) recursive descent parser. The parser is controlled by Seed7 syntax definitions. The parser calls a scanner, which skips whitespace and reads identifiers and literals. Each parsed expression is searched in the internal database of defined objects. This search process is called matching. The matching resolves overloaded functions and generates call-code for the parsed expression. Call-code uses a data structure which is similar to S-Expressions. The analyzer executes the call-code of the parsed and matched expressions. Normally parsed and matched expressions represents declaration statements. Executing a declaration statement adds new defined objects to the internal database.
How does the compiler implement call-by-name parameters?
Every function with call-by-name parameters is searched for recursive calls. If no recursive call of the function is present it can be implemented with code inlining. In this case every call of the function is inlined and the actual call-by-name parameters replace all occurrences of the formal call-by-name parameter in the function body.
If a function cannot be implemented with code inlining (recursive calls occur) pointers to a closure structure are used as formal call-by-name parameters. This closure structure contains a function pointer and a structure which represents the environment of the closure. If a formal call-by-name parameter is used, the function of the closure structure is called with a pointer to the closure environment as parameter.
When a function with call-by-name parameters is called the following things are done: For every actual call-by-name parameter a closure structure with the function pointer and the closure environment structure is generated. An actual function representing the closure code is generated as well. Before a function with a call-by-name parameter is called a closure structure variable is initialized. This includes initializing the function pointer and the environment data of the closure structure variable. Finally a pointer to the closure structure variable is used as actual call-by-name parameter.
What does action "XYZ_SOMETHING" mean?
Actions are used to call a corresponding C function in the interpreter. For example:
The action "INT_ADD" corresponds to the function 'int_add' in the file
Chapter 14 (Primitive actions) of the manual contains a detailed description of the primitive actions. In the interpreter all action functions get the parameters as list. The action functions take the parameters they need from the list, perform the action and deliver a result.
Why are there dollar signs in some places?
The syntax and semantics of Seed7 are defined in the library seed7_05.s7i. So when the interpreter or compiler starts reading a Seed7 program, it knows almost nothing about Seed7. No statements, functions, operators, types or variables are predefined. All these things come from the seed7_05.s7i library. Without seed7_05.s7i, there are just a few hard-coded things such as comments and literals. Among the hard-coded things are include statements which are introduced with a dollar sign. For that reason a Seed7 program needs a
$ include "seed7_05.s7i";
at the beginning. The seed7_05.s7i library starts with including the syntax.s7i library. The file syntax.s7i contains many $ commands. First the type type is defined. Declarations of other types, system variables and syntax descriptions of operators and statements follow. At several places the $ is used to force the analyzer to use a hard-coded expression recognition instead of the configurable one. After finishing the inclusion of syntax.s7i, the file seed7_05.s7i contains some $ declarations until the const declaration statement is established. From that point onward almost no $ statements are needed.
Among other things the seed7_05.s7i library defines include statements and syntax statements, that work without $.
Why does "seed7_05.s7i" contain a version number?
The number 05 is actually a 'branch info'. As if C had headers like
<stdlib_c78.h> /* For K&R C programs */ <stdlib_c89.h> /* For ANSI C */ <stdlib_c99.h> /* For C99 */
and your program must include one of these three headers as first include file (Other include files have no version/branch info in the name). That way nobody is forced to upgrade an old program (to get no warnings or to make it compile). You can leave your old K&R program from 1980 as is. If you decide to rewrite your K&R program to use prototypes, you change the <stdlib...> include file as well.
Programming languages change over long time periods. This results in different language standards. Seed7 tries to address this problem from the beginning. Since most of the Seed7's constructs (statements, operators, types, ... ) are defined in seed7_05.s7i this is the right place to do it.
Can I use an "abc.s7i" include file to boot to the abc language?
Theoretically yes. In practice there would be several problems. For example:
- All primitive actions are defined such that they fit to Seed7.
- Some concepts like goto, return and break are not supported.
- Some things like comments and $ pragmas are hard coded.
But basically booting various languages was one of the goals of the extensible programming language Seed7 and the s7 interpreter.
In practice it turned out to be a better approach to steal concepts from other programming languages and to integrate them in Seed7 than to split the development in different branches.
The capability to boot a language can be used to allow slightly different future versions of Seed7 to coexist with the current version. This is also the reason why the file seed7_05.s7i contains a version number (05).