# VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP functions

An aggregate function that returns the variance of a set of numbers. This is a mathematical property that signifies how far the values spread apart from the mean. The return value can be zero (if the input is a single value, or a set of identical values), or a positive number otherwise.

**Syntax:**

`{ VARIANCE | VAR[IANCE]_SAMP | VAR[IANCE]_POP } ([DISTINCT | ALL] ``expression`)

This function works with any numeric data type.

**Return type:** `DOUBLE`

in Impala 2.0 and higher; `STRING`

in earlier
releases

This function is typically used in mathematical formulas related to probability distributions.

The `VARIANCE_SAMP()`

and `VARIANCE_POP()`

functions compute the sample
variance and population variance, respectively, of the input values. (`VARIANCE()`

is an alias
for `VARIANCE_SAMP()`

.) Both functions evaluate all input rows matched by the query. The
difference is that `STDDEV_SAMP()`

is scaled by `1/(N-1)`

while
`STDDEV_POP()`

is scaled by `1/N`

.

The functions `VAR_SAMP()`

and `VAR_POP()`

are the same as
`VARIANCE_SAMP()`

and `VARIANCE_POP()`

, respectively. These aliases are
available in Impala 2.0 and later.

If no input rows match the query, the result of any of these functions is `NULL`

. If a single
input row matches the query, the result of any of these functions is `"0.0"`

.

**Examples:**

This example demonstrates how `VARIANCE()`

and `VARIANCE_SAMP()`

return the
same result, while `VARIANCE_POP()`

uses a slightly different calculation to reflect that the
input data is considered part of a larger population

.

```
[localhost:21000] > select variance(score) from test_scores;
+-----------------+
| variance(score) |
+-----------------+
| 812.25 |
+-----------------+
[localhost:21000] > select variance_samp(score) from test_scores;
+----------------------+
| variance_samp(score) |
+----------------------+
| 812.25 |
+----------------------+
[localhost:21000] > select variance_pop(score) from test_scores;
+---------------------+
| variance_pop(score) |
+---------------------+
| 811.438 |
+---------------------+
```

This example demonstrates that, because the return value of these aggregate functions is a
`STRING`

, you convert the result with `CAST`

if you need to do further
calculations as a numeric value.

```
[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
+-------------------+
| summary |
+-------------------+
| Inserted 1 row(s) |
+-------------------+
[localhost:21000] > desc score_stats;
+--------------------+--------------+---------+
| name | type | comment |
+--------------------+--------------+---------+
| standard_deviation | decimal(7,4) | |
| variance | decimal(7,4) | |
+--------------------+--------------+---------+
```

**Restrictions:**

This function cannot be used in an analytic context. That is, the `OVER()`

clause is not allowed at all with this function.

**Related information:**

The `STDDEV()`

, `STDDEV_POP()`

, and `STDDEV_SAMP()`

functions compute the standard deviation (square root of the variance) based on the results
of `VARIANCE()`

, `VARIANCE_POP()`

, and
`VARIANCE_SAMP()`

respectively. See STDDEV, STDDEV_SAMP, STDDEV_POP functions
for details about the standard deviation property.