I am at the limits of my understanding of math here, and am looking for an example or lib function, that can preform a Standard Deviation on a list of values and give me back the magic number, so I can move on with my life :)
Lutz: I do like how the language has evolved, Its been a few years, but I have begun a new project, I hope to abuse NewLisp's IP capibilities, this time around.
Later
Bob
Standard Deviation ?
Standard Deviation ?
Bob the Caveguy aka Lord High Fixer.
There is a statistics module "stat.lsp" in the source distribution in the modules directory and documented here: http://newlisp.org/code/modules/stat.lsp.html
Lutz
Code: Select all
(load "stat.lsp")
(set 'lst '(4 5 2 3 7 6 8 9 4 5 6 9 2))
(stat:sdev lst) => 2.39925202
Thanks Guy: I was sure I has seen it somewhere :)
As it was the only function I needed, I packed it a bit into a oneliner:
(define (sdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (sub (length X) 1))
I was interested how this would affect performance so I ran some 10K loop tests using both 10 and 100 element lists.
(define (testsdev)
(setq lst (random 0 10000000000 10))
(println "stat:sdev 10 = " (time (stat:sdev lst) 10000))
(println "MAIN:sdev 10 = " (time (MAIN:sdev lst) 10000))
(setq lst (random 0 10000000000 100))
(println "stat:sdev 100 = " (time (stat:sdev lst) 10000))
(println "MAIN:sdev 100 = " (time (MAIN:sdev lst) 10000))
)
Returned numbers in the range of what I expected about 60% to 80%
stat:sdev 10 = 125
MAIN:sdev 10 = 78
stat:sdev 100 = 719
MAIN:sdev 100 = 579
Keep up the good work !
Bob
As it was the only function I needed, I packed it a bit into a oneliner:
(define (sdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (sub (length X) 1))
I was interested how this would affect performance so I ran some 10K loop tests using both 10 and 100 element lists.
(define (testsdev)
(setq lst (random 0 10000000000 10))
(println "stat:sdev 10 = " (time (stat:sdev lst) 10000))
(println "MAIN:sdev 10 = " (time (MAIN:sdev lst) 10000))
(setq lst (random 0 10000000000 100))
(println "stat:sdev 100 = " (time (stat:sdev lst) 10000))
(println "MAIN:sdev 100 = " (time (MAIN:sdev lst) 10000))
)
Returned numbers in the range of what I expected about 60% to 80%
stat:sdev 10 = 125
MAIN:sdev 10 = 78
stat:sdev 100 = 719
MAIN:sdev 100 = 579
Keep up the good work !
Bob
Bob the Caveguy aka Lord High Fixer.
I'm no stat expert, but don't you have to know if your list is either:



1) sample data from a population, or



2) the population data, itself?
The STDEV formulas are different for 1 versus 2, according to Wikipedia: Standard Deviation.








The STDEV formulas are different for 1 versus 2, according to Wikipedia: Standard Deviation.
(λx. x x) (λx. x x)
Sure Hope I got the right one :)
In my case I am dealing with:
2) the population data, itself?
These are finite samples to be averaged and compaired over time.
Sure Hope I got the right one !
Heck, it will make a nice looking chart either way :)
2) the population data, itself?
These are finite samples to be averaged and compaired over time.
Sure Hope I got the right one !
Heck, it will make a nice looking chart either way :)
Bob the Caveguy aka Lord High Fixer.
... the use of the word "samples" is tricky here. Basically, wikipedia says if "every member of a population is sampled" use sigma for stddev:

But if you only have a proper sample of the population (i.e. not the whole population) use the estimator (of the population's stddev) s:

The only difference between the two expressions is the N versus N - 1 and that
is the population mean in the sigma expression, whereas
is the sample mean in the s expression (no pun intended).
Lutz's sdev function is based on the latter, s, expression, i.e. it's the sample standard deviation.
The short answer is: if your data is really the entire population, you need to divide by N, not N - 1.

But if you only have a proper sample of the population (i.e. not the whole population) use the estimator (of the population's stddev) s:

The only difference between the two expressions is the N versus N - 1 and that


Lutz's sdev function is based on the latter, s, expression, i.e. it's the sample standard deviation.
The short answer is: if your data is really the entire population, you need to divide by N, not N - 1.
(λx. x x) (λx. x x)
Thanks a lot, as it turns out it looks like I will be needing both. in one case I have the entire sample and in the other case I find I only have a representive sample. So here for the archives is the short sweet mininum overhead flavors of each.
(define (sdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (sub (length X) 1))))
(define (stdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (length X))))
Thanks for your helpfull pointers.
Back under my rock again, well at least for a while :)
Bob
(define (sdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (sub (length X) 1))))
(define (stdev X) (sqrt (div (sub (apply add (map mul X X)) (div (mul (apply add X) (apply add X)) (length X))) (length X))))
Thanks for your helpfull pointers.
Back under my rock again, well at least for a while :)
Bob
Bob the Caveguy aka Lord High Fixer.