Selecting MMM models via AIC? Some key pointers ๐
The Akaike information criterion (AIC) is given by:
AIC = 2k -2ln(L)
where
k is the number of parameters
L is the likelihood
The underlying principle behind usage of AIC is the ‘Information Theory’.
Talking about information theory, we have been researching and implementing these concepts innovatively in MMM. (Check the link to our whitepapers under resources).
Coming back to AIC, In the above equation we have the likelihood. We try to maximize the likelihood.
It turns out that, maximizing the likelihood is equivalent of minimizing the KL Divergence.
๐ But what is KL Divergence?
From an information theory point of view, KL divergence tells us how much information we lost due to our approximating of a probability distribution with respect to the true probability distribution.
๐ Why we choose models with lowest AIC
When comparing models, we choose the models with lowest AIC because in turn it means that the KL divergence also would be minimum. Low AIC score means little information loss.
Now you know how KL divergence an AIC are related and why we choose models with low AIC score.
๐ Breaking myths about AIC in context of MMM
One of the misconceptions about AIC is that the AIC helps in choosing the best model out of a given set of models.
However, the key word here is ‘Relative’. AIC helps in choosing the ‘best model’ relative to other models.
For example, if you had 5 MMM models (fitted for same response variable) and all 5 are overfitted badly, then AIC will choose the least overfitted model among all models.
Note: AIC will not caution that all your MMM models are poorly fitted. Much like SHAP values (but more on this in future posts). In a way AIC is like a supremum of a set.
P.S: Link to a comprehensive paper by David Anderson and Kenneth Burnham on Myths about AIC is under resources.
Resources:
Facts and fallacies of AIC : https://robjhyndman.com/hyndsight/aic/
Identification of Saturation Points in MMM through Transfer Entropy: https://www.linkedin.com/posts/venkat-raman-analytics_identification-of-saturation-points-transfer-activity-7120765376694521856-SyrM?utm_source=share&utm_medium=member_desktop
Using Transfer Entropy for Feature Selection in MMM:
https://www.linkedin.com/posts/ridhima-kumar7_marketingmixmodeling-marketingattribution-activity-7119672601920114688-h-Fb?utm_source=share&utm_medium=member_desktop