Wednesday, March 7, 2012

Mathematical help and not really SQL

Create table stdevtest
(col1 int)
Insert stdevtest values(1)
Insert stdevtest values(2)
Insert stdevtest values(3)
Insert stdevtest values(6)
Insert stdevtest values(7)
Insert stdevtest values(8)
Insert stdevtest values(8000)
select avg(col1) from stdevtest
select stdev(col1) from stdevtest
As you can see, I tried avg and also standard deviation, but per the values
above and I aint no mathematician, how can I somehow show the business
folks, that there are some outliers and excluding those eg: value 8000, our
average is really around 4 and stdev is around 2.88
Are there other ways to do this ? In this example, atleast I can see all the
data, but when I was analyzing some data with a few 1000 rows, i did a min
and max and found min being 1 and max being like 500000 and hence the avg is
around 200000 or so which may not be what i want to tell the business
folks.. Sad part is I also dont know how to tell them otherwise
Appreciate any help you can provide..
Hi
You may want skewness see http://en.wikipedia.org/wiki/Skewness
and http://www.users.drew.edu/skass/sql/SkewKurtosis.sql.txt
Also read up on Median
http://en.wikipedia.org/wiki/Median
John
"Hassan" wrote:

> Create table stdevtest
> (col1 int)
> Insert stdevtest values(1)
> Insert stdevtest values(2)
> Insert stdevtest values(3)
> Insert stdevtest values(6)
> Insert stdevtest values(7)
> Insert stdevtest values(8)
> Insert stdevtest values(8000)
> select avg(col1) from stdevtest
> select stdev(col1) from stdevtest
> As you can see, I tried avg and also standard deviation, but per the values
> above and I aint no mathematician, how can I somehow show the business
> folks, that there are some outliers and excluding those eg: value 8000, our
> average is really around 4 and stdev is around 2.88
> Are there other ways to do this ? In this example, atleast I can see all the
> data, but when I was analyzing some data with a few 1000 rows, i did a min
> and max and found min being 1 and max being like 500000 and hence the avg is
> around 200000 or so which may not be what i want to tell the business
> folks.. Sad part is I also dont know how to tell them otherwise
> Appreciate any help you can provide..
>
>
|||Or you could try something like this:
-- Chop off top and bottom
SELECT a.*
FROM stdevtest a CROSS JOIN
(
SELECT MIN(col1) min_col1, MAX(col1) max_col1
FROM stdevtest
) b
WHERE a.col1 > b.min_col1
AND a.col1 < b.max_col1
-- Chop off top and bottom with tolerance
SELECT a.*
FROM stdevtest a CROSS JOIN
(
SELECT MIN(col1) min_col1, MAX(col1) max_col1
FROM stdevtest
WHERE col1 > 1
AND col1 < 7000
) b
WHERE a.col1 >= b.min_col1
AND a.col1 <= b.max_col1
Curtesty of Ken Henderson originally I think!?
Hope that helps.
wBob
"Hassan" wrote:

> Create table stdevtest
> (col1 int)
> Insert stdevtest values(1)
> Insert stdevtest values(2)
> Insert stdevtest values(3)
> Insert stdevtest values(6)
> Insert stdevtest values(7)
> Insert stdevtest values(8)
> Insert stdevtest values(8000)
> select avg(col1) from stdevtest
> select stdev(col1) from stdevtest
> As you can see, I tried avg and also standard deviation, but per the values
> above and I aint no mathematician, how can I somehow show the business
> folks, that there are some outliers and excluding those eg: value 8000, our
> average is really around 4 and stdev is around 2.88
> Are there other ways to do this ? In this example, atleast I can see all the
> data, but when I was analyzing some data with a few 1000 rows, i did a min
> and max and found min being 1 and max being like 500000 and hence the avg is
> around 200000 or so which may not be what i want to tell the business
> folks.. Sad part is I also dont know how to tell them otherwise
> Appreciate any help you can provide..
>
>

No comments:

Post a Comment