Manipulating byte strings -- SOLVED
Posted: Tue Dec 29, 2020 1:44 am
[See solution in thread below.]
I'm trying to implement several versions of the Lempel-Ziv-x and Snappy compression algorithms. Ordinarily, I like to get my logic straight in Lisp, and then, if I need the speed, I'll port the tight loops to a C library. In this case, however, NEWLisp has been atypically difficult to debug. I wonder if there are some simple code patterns I'm overlooking.
It would, of course, be simpler to use a non-UTF-8 enabled build of NEWLisp, but I want to compress UTF-8 strings that I'm processing within NEWLisp.
So given a UTF-8 string us, I understand that (slice us i 1) will give me an 8-bit "char". I also found that defining
helped in some situations. But then I ran into problems trying to unpack a code like 32765 into two bytes. In the following examples I thought I could use the following for the low byte of 253.
And while, as mentioned above, the following use of (char) looks ok
the UTF-8 char length messes with the byte discipline of the compression algorithms.
At last, I found that (pack) can work:
But, a little confusingly, there were still some gotchas. For example, (pack) doesn't work with (mod):
So, long story short, I've got these manipulations more-or-less working, but I wonder if there's a more direct way to manipulate such bytes and 8-bit chars??
I'm trying to implement several versions of the Lempel-Ziv-x and Snappy compression algorithms. Ordinarily, I like to get my logic straight in Lisp, and then, if I need the speed, I'll port the tight loops to a C library. In this case, however, NEWLisp has been atypically difficult to debug. I wonder if there are some simple code patterns I'm overlooking.
It would, of course, be simpler to use a non-UTF-8 enabled build of NEWLisp, but I want to compress UTF-8 strings that I'm processing within NEWLisp.
So given a UTF-8 string us, I understand that (slice us i 1) will give me an 8-bit "char". I also found that defining
Code: Select all
(define (byte s
(i 0) )
(char s i true)
)
Code: Select all
> (mod 32765 256)
253
;; but
> (byte (mod 32765 256))
ý
;; and
>(byte (byte (mod 32765 256)))
195
Code: Select all
>(char (char (mod 32765 256)))
253
>(char (mod 32765 256))
"ý"
>(length "ý")
2
At last, I found that (pack) can work:
Code: Select all
>(pack "b" (& 32765 0xff))
"�"
;; and
> (byte (pack "b" (& 32765 0xff)))
253
;; (and for the high byte):
>(byte (pack "b" (/ 32765 256)))
127
Code: Select all
> (byte (pack "b" (mod 32765 256)))
16