(directory) in Windows and UTF8

Machine-specific discussion
Unix, Linux, OS X, OS/2, Windows, ..?
Locked
Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

(directory) in Windows and UTF8

Post by Fritz »

I have suddenly noticed, that newLISP directory operator returns his data in UTF8 codepage, while native Windows XP codepage (in Russia) is windows-1251 (and newlisp-edit.lsp module has windows-1251 encoding too).

That is not an important problem, becouse I always can use construction like

Code: Select all

(map decode-utf8-to-cp1251 (directory "."))
Fortunately, russian alphabet is 33 letters only, so decoding is easy. I just want to ask about future politics: will newLISP in the future have an operator like *nix iconv?

Cyril
Posts: 183
Joined: Tue Oct 30, 2007 6:27 pm
Location: Moscow, Russia
Contact:

Re: (directory) in Windows and UTF8

Post by Cyril »

Fritz wrote:I just want to ask about future politics: will newLISP in the future have an operator like *nix iconv
You can use iconv in Windows too. A lot of Unix-like applications for Windows comes with iconv library today: in my system there are... let me count... six instances of iconv.dll (yes, this IS the dll hell!). If you have no one, download it form GnuWin32 (the library file is named libiconv2.dll in this package, but it is all the same). The library interface is a bit low-level, but newLISP is doing a very good job accessing low-level interfaces. The following code is demo, but it works for me:

Code: Select all

(import "iconv.dll" "libiconv_open")        ; see 1
(import "iconv.dll" "libiconv")
(import "iconv.dll" "libiconv_close")

(setq cd (libiconv_open "cp866" "utf-8"))   ; see 2, 3

(setq in ((directory ".") -1))              ; see 4
(setq out (pack "n1024"))                   ; see 5

(setq inbuf (pack "lu" (address in)))
(setq inlen (pack "lu" (length in)))
(setq outbuf (pack "lu" (address out)))
(setq outlen (pack "lu" (length out)))

(libiconv cd (address inbuf) (address inlen) (address outbuf) (address outlen))

(libiconv_close cd)

(println out)

(exit)

Some comments:

1) Put iconv.dll in the current directory, or write the full path in import statements; replace "iconv.dll" with "libiconv2.dll" if you have downloaded it from the location mentioned above; all the rest is the same;

2) I suppose you are using utf-8-enabled build of newLISP, in plain 8-bit build (directory) just returns cp1251;

3) I convert the name to cp866, not to cp1251, cause console window is cp866; with 1251 it works the same;

4) I have created the file "привет.txt" just for this demo, so the last file in directory name is converted;

5) I hope 1024 byte buffer is enough;

6) Absolutely NO error check in this demo, in production code must be some;

This demo happily prints "привет.txt". I hope you can elaborate this to production-quality code. ;-)

Update (two hours later):

The code above was incomplete: the out buffer was still containing 1024-byte string, although most of them was zeros. Such string can be printed, but not of much use otherwise. To extract the converted string from the buffer, one must subtract the resulting outlen, modified by libiconv function (in fact it is the "bytes left" value), from the original one. So, instead of printing the out value directly, write the following at the end:

Code: Select all

(setq result (slice out 0 (- (length out) (get-int outlen))))

(println result)
Hope this helps.
With newLISP you can grow your lists from the right side!

Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

Re: (directory) in Windows and UTF8

Post by Fritz »

Cyril wrote:(import "iconv.dll" "libiconv")
GIMP iconv.dll made this for me! Thanx!

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: (directory) in Windows and UTF8

Post by Lutz »

You could further simplify and speed up a little:

Code: Select all

(setq inbuf (pack "lu" in))
(setq inlen (pack "lu" (length in)))
(setq outbuf (pack "lu" out))
(setq outlen (pack "lu" (length out)))

(libiconv cd inbuf inlen outbuf outlen)
Strings are automatically passed by their address to 'pack' and imported functions and therefor don't need the 'address' operator.

Cyril
Posts: 183
Joined: Tue Oct 30, 2007 6:27 pm
Location: Moscow, Russia
Contact:

Re: (directory) in Windows and UTF8

Post by Cyril »

A bit too late, but: a newlisp wrapper to iconv exists on the site of Dmitry Chernyak [here]. It was written as unix-specific, but it is easy to adopt it for Windows usage: just change library name (form ".so" to ".dll") and add prefix "lib" to all three imported functions. Warning: the module works fine for me when correct args are passed, but seems to have a broken error handling. At least during my evaluation it has either gone into infinite loop or return unexceptional results on wrong (unconvertable) args. And no, I have not put a great effort into investigation, just a quick glance, sorry.
With newLISP you can grow your lists from the right side!

IVShilov
Posts: 23
Joined: Wed Apr 12, 2017 1:58 am

Re: (directory) in Windows and UTF8

Post by IVShilov »

Hello, I have a problem with importing iconv in WinXP (old notebook) - newlisp terminates right after first function call, libiconv_open.
[img]2019-02-17_133151%20-%20FS-capture%20-%20newlisp%20CRASH%20iconv.jpg[/img]

Importing from kernel32.dll, user32.dll, works fine, but with iconv from GnuWin32 - no luck.

How can I debug FFI calls? Calling MessageBoxA without parameters ends with same process crash, but with right set of params - it works; so, maybe

Code: Select all

 (libiconv_open "cp866" "utf-8") 
is wrong call?
How to certainly figure out which params and its type I have to pass to wich FFI-function?
I have "Dependency Walker" utility, maybe needs something more?

With another versions of iconv.DLL, found in system, newlisp behave like this, it crashes.
Attachments
2019-02-17_133151 - FS-capture - newlisp CRASH iconv.jpg
2019-02-17_133151 - FS-capture - newlisp CRASH iconv.jpg (74.82 KiB) Viewed 5522 times

ralph.ronnquist
Posts: 228
Joined: Mon Jun 02, 2014 1:40 am
Location: Melbourne, Australia

Re: (directory) in Windows and UTF8

Post by ralph.ronnquist »

This problem might be due to that the passed in strings are temporary, and that therefore their allocated space might be reclaimed too early, before the actual call is done. If that's the case, a wrapping like the following might do the trick

Code: Select all

(let ((IN "CP866") (OUT "UTF-8")) (libiconv_open (address IN) (address OUT)))
You might also get the same effect by declaring the imported function in full, with parameters as char*.

As far as I can tell from a cursory glance at nl-import.c, the parameters are deleted before the actual function call is made. Though I may well misread it or misunderstand it, so don't hold your breath.

Locked