From Test-Scratch-Wiki
标准差,中文环境中又常称“均方差”,标准差能反映一个数据集的离散程度。平均数相同的两组数据,标准差未必相同。标准差是一组数据平均值分散程度的一种度量。一个较大的标准差,代表大部分数值和其平均值之间差异较大;一个较小的标准差,代表这些数值较接近平均值。
Scratch中如何操作
标准差的定义
标准差是方差的算术平方根。
什么是方差? 方差(样本方差)是每个样本值与全体样本值的平均数之差的平方值的平均数。概率论中方差用来度量随机变量和其数学期望(即均值)之间的偏离程度。我们来看一个例子来说明这一点。
假设我们有5朵花,其高度为25厘米,60厘米,40厘米,45厘米和55厘米。他们的平均高度是:
(25 + 60 + 40 + 45 + 55) / 5 = 45
这告诉我们花的平均高度是45厘米。那么,花朵的方差是什么?
Flower #1: ((25) - (45))^2 = (-20)^2 = 400
Flower #2: ((40) - (45))^2 = (-5)^2 = 25
Flower #3: ((45) - (45))^2 = (0)^2 = 0
Flower #4: ((55) - (45))^2 = (10)^2 = 100
Flower #5: ((60) - (45))^2 = (15)^2 = 225
(400 + 25 + 0 + 100 + 225) / 5 = 150
所以花的方差是150厘米。花的标准偏差因此等于150的平方根,即大约12.247 ...
有两种标准差:
1、总体标准差,针对总体数据的偏差。例如,如果世界上只有5朵花,那么12.247就是花高度的总体标准差。
2、样本标准差。样本标准差是只有一部分数据的标准差。例如:我们拿五朵花。世界上显然有五朵以上的花,所以五朵花只是全部数据的一部分。针对从总体抽样,利用样本来计算总体偏差。就必须将算出的标准偏差的值适度放大。
两种标准偏差之间唯一的区别是如何计算方差。总体标准差将遵循上例规则,然而,样本标准差将取平均值的平方差的总和,然后除以数据集的数量减1。例如,让我们回顾一下花朵并重新计算它们的方差:
(400 + 25 + 0 + 100 + 225) / (5 - 1) = 187.5
在这里,5是已知高度的花朵的数量。从中减去1,因为这是一个样本标准差。计算样本标准差的最后一步是取187.5的平方根,即大约13.693 ...
变量
在本教程中将需要一个列表:
- 数据集列表
该列表将包含所有数据样本,如花朵的高度。
同时,还需要七个变量:
- Average
- Sum
- Variance
- Standard Deviation
- Number
- Sum2
- Number2
代码
本教程先演示计算样本标准差。
计算“样本标准差”的第一步是计算出一些数字的“平均值”。该脚本如下所示:
when gf clicked set [Sum v] to (0)//初始化变量。 set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (item (Number) of [Data v]) change [Number v] by (1)//变量(Number) 是代码取数据集列表的指针。 end set [Average v] to ((Sum) / (length of [Data v]))
计算“样本标准差”的第二步是计算“方差”:
when gf clicked set [Sum v] to (0)//Resetting the variables. set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (item (Number) of [Data v]) change [Number v] by (1) end set [Average v] to ((Sum) / (length of [Data v])) set [Sum2 v] to (0) set [Number2 v] to (1) repeat (length of [Data v]) change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average))) change [Number2 v] by (1) end set [Variance v] to ((Sum2) / ((length of [Data v]) - (1)))
计算样本标准差的最后一步是取方差的平方根:
when gf clicked set [Sum v] to (0)//Resetting the variables. set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (item (Number) of [Data v]) change [Number v] by (1) end set [Average v] to ((Sum) / (length of [Data v])) set [Sum2 v] to (0) set [Number2 v] to (1) repeat (length of [Data v]) change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average))) change [Number2 v] by (1) end set [Variance v] to ((Sum2) / ((length of [Data v]) - (1))) set [Standard Deviation v] to ([sqrt v] of (Variance))
计算“总体标准差”,只需要很小的调整,代码如下:
when gf clicked set [Sum v] to (0)//Resetting the variables. set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (item (Number) of [Data v]) change [Number v] by (1) end set [Average v] to ((Sum) / (length of [Data v])) set [Sum2 v] to (0) set [Number2 v] to (1) repeat (length of [Data v]) change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average))) change [Number2 v] by (1) end set [Variance v] to ((Sum2) / ((length of [Data v]) - (1))) set [Standard Deviation v] to ([sqrt v] of (Variance))
完整代码
计算“样本标准差”的代码是:
when gf clicked set [Sum v] to (0)//Resetting the variables. set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (item (Number) of [Data v]) change [Number v] by (1) end set [Average v] to ((Sum) / (length of [Data v])) set [Sum2 v] to (0) set [Number2 v] to (1) repeat (length of [Data v]) change [Sum2 v] by (((item (Number2) of [Data v]) - (Average)) * ((item (Number2) of [Data v]) - (Average))) change [Number2 v] by (1) end set [Variance v] to ((Sum2) / ((length of [Data v]) - (1))) set [Standard Deviation v] to ([sqrt v] of (Variance))
计算总体标准差的代码是:
when gf clicked set [Sum v] to (0)//Resetting the variables. set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (item (Number) of [Data v]) change [Number v] by (1) end set [Average v] to ((Sum) / (length of [Data v])) set [Sum v] to (0)//Resetting the variables. set [Number v] to (1) repeat (length of [Data v]) change [Sum v] by (((item (Number) of [Data v]) - (Average)) * ((item (Number) of [Data v]) - (Average))) change [Number v] by (1) end set [Variance v] to ((Sum) / (length of [Data v])) set [Standard Deviation v] to ([sqrt v] of (Variance))