I've converted the python program written by Michael Nielsen from his his excellent on-line book at http://neuralnetworksanddeeplearning.com to newLisp. Presented some challenges as I'm not really well versed in Python or the library used (numpy) and I needed to brush up my knowledge on matrices as well to get it all working reasonably fast. (thanks Lutz for the fast matrix functions!).
With this version you can play with stochastic gradient descent for a feedforward neural network.I've included load and convert functions for the MNIST dataset, which is kind of a standard to test learning algorithms against. Download it from http://yann.lecun.com/exdb/mnist/ and with below program you're good to go.
Let me know if you encounter any bugs (most probably!) and Lutz let me know if and how this can can be included as a module.
Code: Select all
;; Mini-batch stochastic gradient descent learning algorithm for a feedforward neural network in newLISP. Based on a python program and book by Michael Nielsen, see http://neuralnetworksanddeeplearning.com/about.html and below for the MIT license. ;Usage (nn:sgd) with arbitrary number and sequence of named parameters . ; For instance (nn:sgd eta 1 network '(784 100 10) lmbda 1), other parameters will use default values, see below. ;; network contains the number of neurons in the respective layers of the network. For example, if the list was [2, 3, 1] then it would be a three-layer network, with the first layer containing 2 neurons, the second layer 3 neurons, and the third layer 1 neuron. For the included MNIST data and load functions the first (input) layer has to contain 784 neurons and the last (ouptput) layer 10. There can be 0-n number of (hiden) layers and neurons in between. ;; epochs The number of times the whole dataset is traversed ;; mini-batch-size the numnber of instances trained together ;; eta the learning rate, positive number, try 0.1 or 1 or 10 and then tweak ;; number the number of instances to train upon (max 60000 for number+test data together for the MNIST training data ;; test-data the number of instances to test against, used for accuracy calculation ;; w-init weight initiliazer function, choice between default-w and large-w ;; cost cost function, choice between quadratic and cross-entropy ;; lmbda used for regularization to prevent overfitting, must be greater or equal to zero. ;; i-path path to the input file, for MNIST the name of the file is train-images.idx3-ubyte, ;; l-path path to output file For MNIST the name of the file is train-labels.idx1-ubyte, download from http://yann.lecun.com/exdb/mnist/ (context 'nn) (define-macro (sgd) (set 'network '(784 20 10) 'epochs 10 'mini-batch-size 10 'eta 4 'number 10000 'test-data 100 'w-init default-w 'cost cross-entropy 'lmbda 0 'i-path "Downloads/train-images.idx3-ubyte" 'l-path "Downloads/train-labels.idx1-ubyte" ) (when (args) (bind (explode (args) 2) true)) (when (< (length input) number) (load-mnist-train-images (+ number test-data) i-path) (load-mnist-train-labels (+ number test-data) l-path)) (set 'biases (map (fn(x) (array x 1 (normal 0 1 x))) (rest network))) (set 'weights (map w-init (rest network) (chop network))) (dotimes (j epochs) (setq start (time-of-day)) (setq mini-batches (explode (randomize (sequence 0 number)) mini-batch-size)) (dolist (mb mini-batches) (update-mini-batch mb)) (if test-data (silent (fork (println "Epoch:"(+ j 1) " Duration:"(/ (sub (time-of-day) start) 1000) " Accuracy:" (format "%2.2f%%" (mul 100 (evaluate number (+ number (- test-data 1)))))))) (println "Epoch: " j " complete")))) (define (update-mini-batch mini-batch) (set 'batchsize (length mini-batch)) (backprop mini-batch) (set 'biases (map (fn(b nb) (mat - b (mat * nb (div eta batchsize)))) biases (map (fn(x) (multiply x (array batchsize 1 '(1)))) nabla-b))) (set 'weights (map (fn(w nw) (mat - (mat * w (sub 1 (mul eta (div lmbda number)))) (mat * nw (div eta batchsize)))) weights nabla-w))) (define (backprop lst) (set 'activations (list (transpose (map(fn (x) (flat (input x)))lst))) 'b-biases (map (fn(x) (multiply x (array 1 batchsize '(1)))) biases) 'b-weights weights 'nabla-b '() 'nabla-w '() 'zs '()) (dolist (bw (map list b-biases b-weights)) (push (mat + (multiply (bw 1) (activations -1)) (bw 0)) zs -1) (push (sigmoid (zs -1)) activations -1)) (set 'delta (cost (activations -1) (transpose (map (fn(x) (flat (output x)))lst)) (zs -1))) (push delta nabla-b) (push (multiply delta (transpose (activations -2))) nabla-w) (setq x -2) (while (> x (sub (length network))) (set 'delta (mat * (multiply (transpose (b-weights (+ x 1))) delta) (sigmoid-prime (zs x)))) (push delta nabla-b) (push (multiply delta (transpose (activations (sub x 1)))) nabla-w) (dec x))) ;; weight initializers (define (default-w x y) (array x y (map (fn(z) (div z (pow x 0.5))) (normal 0 1 (* x y))))) (define (large-w x y) (array x y (normal 0 1 (* x y)))) ;; cost functions (define (quadratic a y z) (mat * (mat - a y) (sigmoid-prime z))) (define (cross-entropy a y z) (mat - a y)) ;; utility functions (define (ff f) (let(a (list (input f))) (argmax (flat(nth '(-1 -1)(map (fn (b w) (push (sigmoid (mat + (multiply w (a -1)) b)) a -1)) biases weights)))))) (define (ev nr) (if (= (ff nr) (argmax(flat(output nr)))) 1 0)) (define (evaluate start end) (let (teller 0) (for (x start end) (inc teller (ev x))) (div teller (+(- end start)1)))) (define (load-mnist-data var file number block offset) (let (buff (read-file file) parse-bytes (dup "b " block) lst '()) (for (x offset (+ (- offset 1) (* number block)) block) (push (explode (unpack parse-bytes (slice buff x block))) lst -1)) (set var lst))true) (define (load-mnist-train-images number i-path) (load-mnist-data (sym 'input) i-path number 784 16) (set 'input (map m-normalize (eval 'input)))true) (define (load-mnist-train-labels number l-path) (load-mnist-data (sym 'output) l-path number 1 8) (set 'output (convert (eval 'output)))true) (define (convert labels) (map (fn (x) (setq t (dup 0 10)) (setq (t (x 0)) 1)(explode t)) labels)) (define (m-map function matrix) (map (fn(x) (map function x)) matrix)) (define (sigmoid-n z) (div 1 (add 1 (exp (sub z))))) (define (sigmoid-prime-n z) (let (sig (sigmoid-n z)) (mul sig (sub 1 sig)))) (define (sigmoid m) (m-map sigmoid-n m)) (define (sigmoid-prime m) (m-map sigmoid-prime-n m)) (define (argmax lst) (find (apply max lst) lst)) (define (shape lst) (list (length lst) (length (lst 0)))) (define (normalize n (qty 255)) (div n qty)) (define (m-normalize m ) (m-map normalize m)) (define (m-log m) (m-map log m)) (define (m-log-1 m) (m-map log-1 m)) (define (log-1 n) (log (sub 1 n))) (context 'MAIN) ; NewLIps version based on Python program wriiten by Michael Nielsen, see below license of Python program. ;MIT License ;Copyright (c) 2012-2015 Michael Nielsen ;Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: ;The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. ;THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.