How to take one byte from a string
Posted: Wed Oct 07, 2009 9:15 pm
I'm trying to read the string byte-per-byte (for encoding from 8-bit codepage to UTF-8). But (pop the-string) returns some random number of bytes, so does (the-string 0) etc:
http://img7.imageshost.ru/imgs/091008/3 ... /11005.png
(set-locale "C") did not help too. Only working way I have found is to write temporary file and then use read-char function.
May be, there is a shorter way, without file-writing? I need this function in both Linux and Windows, and Windows temp directory has another name.
http://img7.imageshost.ru/imgs/091008/3 ... /11005.png
(set-locale "C") did not help too. Only working way I have found is to write temporary file and then use read-char function.
Code: Select all
; Usage: (cyr-win-utf "text in windows-1251 encoding")
; Decodes text from windows-1251 to utf-8
(define (cyr-win-utf t-linea)
; Loading encoding table
(set 'en-win-1251 '((255 "я") (254 "ю") (253 "э") (252 "ь") (251 "ы")
(250 "ъ") (249 "щ") (248 "ш") (247 "ч") (246 "ц") (245 "х") (244 "ф")
(243 "у") (242 "т") (241 "с") (240 "р") (239 "п") (238 "о") (237 "н")
(236 "м") (235 "л") (234 "к") (233 "й") (232 "и") (231 "з") (230 "ж")
(184 "ё") (229 "е") (228 "д") (227 "г") (226 "в") (225 "б") (224 "а")
(223 "Я") (222 "Ю") (221 "Э") (220 "Ь") (219 "Ы") (218 "Ъ") (217 "Щ")
(216 "Ш") (215 "Ч") (214 "Ц") (213 "Х") (212 "Ф") (211 "У") (210 "Т")
(209 "С") (208 "Р") (207 "П") (206 "О") (205 "Н") (204 "М") (203 "Л")
(202 "К") (201 "Й") (200 "И") (199 "З") (198 "Ж") (168 "Ё") (197 "Е")
(196 "Д") (195 "Г") (194 "В") (193 "Б") (192 "А")))
; saving string to a temp file
(set 't-file-name (append "/tmp/" (crypto:md5 (string (random)))))
(write-file t-file-name t-linea)
; loading characters to the t-out
(set 't-out "")
(set 't-file (open t-file-name "read"))
(while (set 't-char (read-char t-file))
(push (or (lookup t-char en-win-1251) (char t-char)) t-out -1))
(close t-file)
t-out)