get-wide-string builtin function?
Posted: Mon Jul 07, 2014 9:35 am
I was able to work out a user-defined function for converting a wchar_t array to a layout that appears compatible with what the 'unicode' function does. However, I was wondering if it could be converted to a builtin primitive function. It seems like a good idea to me because there is already a builtin function for converting UTF-32 to UTF-8 and UTF-8 to UTF-32. Basically, the idea is to have a function like 'get-string', but that works on UTF-32 character arrays that are acquired from a C library somehow. I tried 'get-string', and it simply does not work. Here is my approach using existing builtin functions:
'result' is a pointer to a wchar_t from a shared library on Linux.
I know wchar_t has no guaranteed size. However, for the systems where wchar_t is used to hold UTF-32, I thought it would be nice to have a builtin function that can copy the wchar_t string and then wrap it into a newLISP cell. Basically, converting it to a form identical to what 'unicode' does. This would mean it could then be fed to the 'utf8' function to convert it back to the encoding that newLISP uses.
I feel this would be a good addition to newLISP because it would allow newLISP to give and receive UTF-32 strings to and from C libraries. Right now, you can only give UTF-32 strings to C libraries. And while I seldom see UTF-32 used by C libraries, it is used by a Text User Interface library I am trying to write an interface module for. I would prefer this be implemented within the interpreter, as it would be slower when run from within the interpreted language.
However, you may have other reasons why it shouldn't be. But, I was thinking it's a good idea to do this because if you need to convert very large strings, it would probably be faster if the code was implemented in C. Thoughts Lutz?
Code: Select all
(define (get-wide-string ptr)
(let (p ptr)
(while (!= ((unpack "lu" p) 0) 0)
(set 'p (+ p 4))
)
((unpack (format "s%d" (- p ptr -4)) ptr) 0)
)
)
(println (utf8 (get-wide-string result)))
I know wchar_t has no guaranteed size. However, for the systems where wchar_t is used to hold UTF-32, I thought it would be nice to have a builtin function that can copy the wchar_t string and then wrap it into a newLISP cell. Basically, converting it to a form identical to what 'unicode' does. This would mean it could then be fed to the 'utf8' function to convert it back to the encoding that newLISP uses.
I feel this would be a good addition to newLISP because it would allow newLISP to give and receive UTF-32 strings to and from C libraries. Right now, you can only give UTF-32 strings to C libraries. And while I seldom see UTF-32 used by C libraries, it is used by a Text User Interface library I am trying to write an interface module for. I would prefer this be implemented within the interpreter, as it would be slower when run from within the interpreted language.
However, you may have other reasons why it shouldn't be. But, I was thinking it's a good idea to do this because if you need to convert very large strings, it would probably be faster if the code was implemented in C. Thoughts Lutz?