## F-test – wikipedia gasbuddy trip

The F-test is sensitive to non-normality. [2] [3] In the analysis of variance (ANOVA), alternative tests include Levene’s test, Bartlett’s test, and the Brown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption of homoscedasticity ( i.e. homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experiment-wise Type I error rate. [4] Formula and calculation [ edit ]

Most F-tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares. The test statistic in an F-test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the *null hypothesis* is not true. In order for the statistic to follow the F-distribution under the null hypothesis, the sums of squares should be statistically independent, and each should follow a scaled χ²-distribution. The latter condition is guaranteed if the data values are independent and normally distributed with a common variance. Multiple-comparison ANOVA problems [ edit ]

The F-test in one-way analysis of variance is used to assess whether the expected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The **ANOVA F-test** can be used to assess whether any of the treatments is on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the **ANOVA F-test** is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons. The disadvantage of the *ANOVA F-test* is that if we reject the *null hypothesis*, we do not know which treatments can be said to be significantly different from the others, nor, if the F-test is performed at level α, can we state that the treatment pair with the greatest mean difference is significantly different at level α.

where Y ¯ i ⋅ {\displaystyle {\bar {Y}}_{i\cdot }} denotes the sample mean in the i-th group, n i {\displaystyle n_{i}} is the number of observations in the i-th group, Y ¯ {\displaystyle {\bar {Y}}} denotes the overall mean of the data, and K {\displaystyle K} denotes the number of groups.

where Y i j {\displaystyle Y_{ij}} is the j th observation in the i th out of K {\displaystyle K} groups and N {\displaystyle N} is the overall sample size. This F-statistic follows the F-distribution with degrees of freedom d 1 = K − 1 {\displaystyle d_{1}=K-1} and d 2 = N − K {\displaystyle d_{2}=N-K} under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value.

Consider two models, 1 and 2, where model 1 is ‘nested’ within model 2. Model 1 is the restricted model, and Model 2 is the unrestricted one. That is, model 1 has p 1 parameters, and model 2 has p 2 parameters, where p 2 > p 1, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2.

One common context in this regard is that of deciding whether a model fits the data significantly better than does a naive model, in which the only expanatory term is the intercept term, so that all predicted values for the dependent variable are set equal to that variable’s sample mean. The naive model is the resteicted model, since the coefficients of all potential explanatory variables are restricted to equal zero.

Another common context is deciding whether there is a structural break in the data: here the restricted model uses all data in one regression, while the unrestricted model uses separate regressions for two different subsets of the data. This use of the F-test is known as the Chow test.

The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives a significantly better fit to the data. One approach to this problem is to use an F-test.

If there are n data points to estimate parameters of both models from, then one can calculate the F statistic, given by F = ( RSS 1 − RSS 2 p 2 − p 1 ) ( RSS 2 n − p 2 ) , {\displaystyle F={\frac {\left({\frac {{\text{RSS}}_{1}-{\text{RSS}}_{2}}{p_{2}-p_{1}}}\right)}{\left({\frac {{\text{RSS}}_{2}}{n-p_{2}}}\right)}},}

where RSS i is the residual sum of squares of model i. If the regression model has been calculated with weights, then replace RSS i with χ 2, the weighted sum of squared residuals. Under the **null hypothesis** that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with ( p 2− p 1, n− p 2) degrees of freedom. The __null hypothesis__ is rejected if the F calculated from the data is greater than the critical value of the F-distribution for some desired false-rejection probability (e.g. 0.05). The F-test is a Wald test. References [ edit ]