suggestions for better string mangling

Notices and updates
Locked
cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

suggestions for better string mangling

Post by cormullion »

I wrote this yesterday, converting it as best I could from Python:

Code: Select all

(define (edits1 word) 
 (let ((l (length word)) (res '()))
    (for (i 0 (- l 1)) (push (replace (nth i (string word)) (string word) "") res -1))  ; deletion
    (for (i 0 (- l 2)) (push (swap i (+ i 1) (string word)) res -1))                    ; transposition
    (for (i 0 (- l 1)) (for (c (char "a") (char "z")) 
        (push (replace (nth i (string word)) (string word) (char c)) res -1)            ; alteration
        (push (replace (nth i (string word)) (string word) (string (char c) (nth i (string word)) )) res -1))) ; insertion
    res))
The idea was to take a word and to generate a set of most of the possible single-letter modifications of that word. So given "newlisp", you'd get deletions: "ewlisp" "nwlisp" "nelisp"..., transpositions: "enwlisp" "nwelisp" ..., insertions: "anewlisp" "bnewlisp" ..., and alterations: "aewlisp" "bewlisp"....

Lutz said I was using 'string casts' - I think he meant those

Code: Select all

(string word)
functions; they're to stop replace modifying the original word too soon. But I'm sure there's a better way of doing the job. Anyone got any bright ideas?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

The function (edits1 word) is entered with 'word' already being a string. This means we don't need the first 'string' cast in 'replace'. We still need the second cast, because we want 'replace' to be non-destructive, as we need the same word in following 'replace' statements again:

Code: Select all

; current code in edits1
(for (i 0 (- l 1)) 
  (push (replace (nth i (string word)) (string word) "") res -1))

; leaving out one cast and implicit indexing
(for (i 0 (- l 1)) 
  (push (replace (word i) (string word) "") res -1))
This is still the same logic as before but slightly shorter and faster.

But here is another approach for extracting the single letters using the select function and much faster:

Code: Select all

; new logic with select and almost 3 times as fast as the original
(for (i 0 (- l 1)) 
  (push (select word (replace i (sequence 0 (- l 1)))) res -1))
The 'replace' takes out one number of (0 1 2 3 4 5 ...) then uses the remaining as a parameter to the 'select' statement.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

That's clever! It's surprising that it's so much faster, too.

Thanks!

Locked