C from newlisp?
Posted: Wed Oct 07, 2009 10:02 am
I saw some quoted discussion in Kazimir's blog, and there's something maybe relevant to the wider topic of how to use C from scripting languages in general, including newlisp.
There is an excellent open-source project called "tcc", a tiny C compiler. The binary is slightly over 100k (123 on my platform).
It understands all of ANSI C plus some extensions; it can compile libraries or standalone execs, and it can be used for "scripting", if the first line in the file is
a shebang invocation of #!/path/to/tcc (plus some options, e.g. lib includes etc)
TCC also contains an assembler.
The main point is that it runs roughly an order of magnitude faster that GCC (the ratio was 9 in test-compilation of the source of a links web browser, if I am not mistaken).
It also compiles on Windows, i.e. it's crossplatform
HOME PAGE: http://bellard.org/tcc/
--------------
THEREFORE there are basically several options if one wants to use C or even assembler from his scripting language.
(a) write your C, then invoke "tcc - run" piping output back into your script.
"tcc -run ......" will compile it on the fly and not create an "a.out".
Alternatively, use a simple wrapper that checks if your inline was compiled already and reuses the generated binary, saved as a small file, not to repeat it in subsequent runs.
(b) write your C, then compile it with tcc as a library; use newlisp built-in fuction to load the tiny lib you created on the fly and talk with it using newlisp facilities
(c) TCC itself can be compiled as a static libtcc.a
Its APIs are outlined in its header file. It is possible, generally speaking, to produce an extended version of newlisp with this lib compiled in (just the way a library that implements httpd or some hashing is compiled into it).
I do not believe it is the best way, though, because of the need to learn a whole bunch of API functions and because, while not fattening the newlisp binary that much, such an add-on would prevent newlisp from remaining a standalone exec, as it will tie it to some other files (e.g. headers ), i.e. will require a "system installation". This lack of dependencies is one feature that makes newlisp drastically different (and better than) most scripting languages, in my view.
The amount of wrapping of the C code to be used with such a tiny fast one-the-fly compiler should be negligible, and I would say in most cases of practical use the need to run an extra process for the C sections will not affect the usability of the script.
Two more major points:
- tcc is so small, it generates straightforward stuff in microseconds (5-8 microseconds for something like the Fibonacci test prog). One CAN use that for DYNAMIC generation of C code from your script, not only for pre-compilation of some static parts of a program
- tcc can help in using external libraries which are difficult to use from newlisp itself, directly. One can write a simple wrapper in a few lines which will present the result of an invocation of library functions in the form convenient for passing over to newlisp (e.g. as a string or some list, whatever).
----
There is another project (I'll check the name and add it), which fakes the same scripting approach. The "script" in C is in fact passed to the full GCC compiler on the first invocation, and the compiled a.out is called on subsequent ones.
This of course is (a) slower) and (b) much heavier on IO at least.
So I believe the TCC road - using a blazingly fast C compiler which can of course link with any existing libraries in C etc. etc. - to write convenient wrappers whenever newlisp operators are at an end and/or to write pseudo-inlined sections of code in C or assembler, which then can be used from the scripting language -- is practical and will cover most of the real-life uses.
-----
P.S. Python people already went that way, as a matter of fact:
http://www.cs.tut.fi/~ask/cinpy/
"Cinpy is a Python library that allows you to implement functions with C in Python modules. The functions are compiled with tcc (Tiny C Compiler) in runtime. The results are made callable in Python through the ctypes library."
There is an excellent open-source project called "tcc", a tiny C compiler. The binary is slightly over 100k (123 on my platform).
It understands all of ANSI C plus some extensions; it can compile libraries or standalone execs, and it can be used for "scripting", if the first line in the file is
a shebang invocation of #!/path/to/tcc (plus some options, e.g. lib includes etc)
TCC also contains an assembler.
The main point is that it runs roughly an order of magnitude faster that GCC (the ratio was 9 in test-compilation of the source of a links web browser, if I am not mistaken).
It also compiles on Windows, i.e. it's crossplatform
HOME PAGE: http://bellard.org/tcc/
--------------
THEREFORE there are basically several options if one wants to use C or even assembler from his scripting language.
(a) write your C, then invoke "tcc - run" piping output back into your script.
"tcc -run ......" will compile it on the fly and not create an "a.out".
Alternatively, use a simple wrapper that checks if your inline was compiled already and reuses the generated binary, saved as a small file, not to repeat it in subsequent runs.
(b) write your C, then compile it with tcc as a library; use newlisp built-in fuction to load the tiny lib you created on the fly and talk with it using newlisp facilities
(c) TCC itself can be compiled as a static libtcc.a
Its APIs are outlined in its header file. It is possible, generally speaking, to produce an extended version of newlisp with this lib compiled in (just the way a library that implements httpd or some hashing is compiled into it).
I do not believe it is the best way, though, because of the need to learn a whole bunch of API functions and because, while not fattening the newlisp binary that much, such an add-on would prevent newlisp from remaining a standalone exec, as it will tie it to some other files (e.g. headers ), i.e. will require a "system installation". This lack of dependencies is one feature that makes newlisp drastically different (and better than) most scripting languages, in my view.
The amount of wrapping of the C code to be used with such a tiny fast one-the-fly compiler should be negligible, and I would say in most cases of practical use the need to run an extra process for the C sections will not affect the usability of the script.
Two more major points:
- tcc is so small, it generates straightforward stuff in microseconds (5-8 microseconds for something like the Fibonacci test prog). One CAN use that for DYNAMIC generation of C code from your script, not only for pre-compilation of some static parts of a program
- tcc can help in using external libraries which are difficult to use from newlisp itself, directly. One can write a simple wrapper in a few lines which will present the result of an invocation of library functions in the form convenient for passing over to newlisp (e.g. as a string or some list, whatever).
----
There is another project (I'll check the name and add it), which fakes the same scripting approach. The "script" in C is in fact passed to the full GCC compiler on the first invocation, and the compiled a.out is called on subsequent ones.
This of course is (a) slower) and (b) much heavier on IO at least.
So I believe the TCC road - using a blazingly fast C compiler which can of course link with any existing libraries in C etc. etc. - to write convenient wrappers whenever newlisp operators are at an end and/or to write pseudo-inlined sections of code in C or assembler, which then can be used from the scripting language -- is practical and will cover most of the real-life uses.
-----
P.S. Python people already went that way, as a matter of fact:
http://www.cs.tut.fi/~ask/cinpy/
"Cinpy is a Python library that allows you to implement functions with C in Python modules. The functions are compiled with tcc (Tiny C Compiler) in runtime. The results are made callable in Python through the ctypes library."