Encoding surrealism in WIN10: UTF-8-NEWLISP in CMD.EXE
Posted: Wed Apr 04, 2018 9:13 am
I spent 8 hours figuring out HOW it works in windows cmd.exe and found a paradox.
Two paradoxes.
Try this by yourself, all code in this post is copy and paste from cmd.exe window.
Starts CMD.EXE, and newlisp.exe without any init.lsp, and put him a valid cyrillic filepath as first parameter:
After hours of en- decoding between UTF-8, CP866 and CP1251 I have lucky shot in the dark and have paradox one: UTF8-path, decoded in CP866, must be decoded as CP1251 to CP866 again:
Newlisp think that there is no such file, but I think it is, I see "D:\\tmp\\Ё.doc".
Paradox two:
Ok, explorer.exe, what do you think about that?
Ё.doc: Ё.txt PPL, I think only some kind of Data Flow Diagram may clearly shows whats going under the hood of GUI and where the silent charset translations take place. As I know, CMD.EXE works in CP866, FileSystem store file paths in CP1251, and newlisp.exe internally works in UTF-8. Let's discuss
Two paradoxes.
Try this by yourself, all code in this post is copy and paste from cmd.exe window.
Starts CMD.EXE, and newlisp.exe without any init.lsp, and put him a valid cyrillic filepath as first parameter:
Code: Select all
D:\tmp>r:\bin\newlisp\newlisp.exe -n "D:\tmp\Ё.doc"
newLISP v.10.7.1 64-bit on Windows IPv4/6 UTF-8 libffi, options: newlisp -h
> (last (main-args))
"D:\\tmp\\╨╕.doc" # two symbols - not one, it's UNICODE
> (load {R:\bin\newlisp\modules\iconv.lsp})
MAIN
> (file? (last (main-args))) # may be it understands as valid path?
nil
> (Iconv:convert (last (main-args)) {UTF-8} {CP866}) # OK, de-UNICODE it
"D:\\tmp\\и.doc"
> # one symbol, but there must be "Ё"!
Code: Select all
> (Iconv:convert (Iconv:convert (last (main-args)) {UTF-8} {CP866}) {CP1251} {CP866})
"D:\\tmp\\Ё.doc"
> # no logic, but now we have a readable file path!
> (file? (Iconv:convert (Iconv:convert (last (main-args)) {UTF-8} {CP866}) {CP1251} {CP866}) ) # but what about this thinks newlisp itself?
nil
Paradox two:
Code: Select all
> (write-file {D:\tmp\1.txt} {1}) # OK, newlisp, does the file you create by yourself...
1
> (file? {D:\tmp\1.txt}) # ... would be a truly file?
true
> (write-file {D:\tmp\Ё.txt} {Ё}) # OK, now special case
1
> (file? {D:\tmp\Ё.txt})
true
> (file? {D:\tmp\Ё.doc})
nil
>
Ё.doc: Ё.txt PPL, I think only some kind of Data Flow Diagram may clearly shows whats going under the hood of GUI and where the silent charset translations take place. As I know, CMD.EXE works in CP866, FileSystem store file paths in CP1251, and newlisp.exe internally works in UTF-8. Let's discuss