User:MPopov (WMF)/Notes/Log transform in regression

From Meta, a Wikimedia project coordination wiki

Linear regression[edit]

The multiple linear regression model is often written as:

Where is the independent, identically distributed error term and is usually assumed to come from a Normal distribution with mean 0 and standard deviation .

The quantity is , the expected value of given the values of .

After the intercept , the coefficients represent changes in the expected value of from unit changes in the corresponding .

For example, suppose we had 2 predictors and we held the value of constant while increasing by 1 unit:

Binary (0/1) indicators are especially nice because when then the corresponding doesn't contribute to expected value of , but when then y changes by the corresponding . So if is "received new UI/UX" then the corresponding represents the effect of receiving the new UI/UX on whatever metric/KPI is.

Log transformation[edit]

Sometimes it's necessary to transform either the response/outcome variable and/or some of the regressors/predictors .

If we apply the transformation to the dependent variable we end up with:

where is usually the natural log ( aka ). Changes in regressors/predictors correspond to changes in expected value of the response/outcome on the log scale, which can pose difficulties. To talk about changes on the original scale, we have to apply the inverse of the natural log: . Then

Notice how exponent of the summed terms becomes product of exponentiated terms, turning what used to be additive changes in expected value to multiplicative changes in expected value.

To see the effect of a unit increase in , we repeat the example from earlier but this time with a log-transformed :

Log-Log transformation[edit]

If we apply the transformation to both the dependent and the independent variable, we end up with:

Suppose we increase by one percent (1%):

Therefore a 1% increase in corresponds to a multiplicative change, which will be less than 1 (a decrease) if if negative and will be more than 1.01 (an increase) if is positive. The closer to 0 that is, the closer to 1 (no change) will be.