At Aryma Labs, we strive to incorporate information theoretic approach in our MMM modeling processes because correlation based approaches have their limitations.
Of late, in our recent client projects, we have been using KL Divergence as calibration metric over and above the other calibration metrics like R Squared value, Standard Error, P value, within sample MAPE etc.
I would urge the readers to read my post on difference between calibration and validation.
Any statistician would be irked to find them used interchangeably. π
π What exactly is KL Divergence ?
KL Divergence is a measure of how much a probability distribution differs from another probability distribution. There is a true distribution P (x) and an estimated distribution Q(x). The smaller the divergence the more similar the probability distributions are to each other.
From an information theory point of view, KL divergence tells us how much information we lost due to our approximating of a probability distribution with respect to the true probability distribution
It might look like a distance metric, but it is not. Since it is not symmetric it cannot be a distance metric.
π Why do we use KL Divergence as calibration metric in MMM?
MMM is all about attribution. There is a true value or true ROI of the marketing variable. Through MMM, our job is to hone in on this true ROI.
Our MMM models hence have to be unbiased so that we converge to this true ROI values.
KL Divergence provides information on how biased your models are. Think of it like a beeping alarm that beeps when you ‘veer off’ too much from the ground truth.
π What causses these veering off?
Answer – Bias
Bias is generally an attribute of the estimator.
Statistically speaking, Bias is the difference between estimator’s expected value and the true value of the parameter it is estimating.
An estimator is said to be unbiased if itβs expected value is equal to the parameter that weβre trying to estimate.
π How Bias enters your MMM model?
In MMM, Multicollinearity is a notorious problem. It is a problem of too much overlapping signals. In MMM, you need clear signals so that you can attribute the change in KPI to the marketing variables.
To overcome this problem, statisticians use regularization (penalized regression) techniques. This inevitably causes huge bias in the model.
Your model as a result might be multicollinearity free but it would have veered off too much from the ground truth.
We have been carrying out multiple experiments and we have found enough proof that regularization always causes bias in the model.
As they say the cure should not be worse than the disease. Bias is a worse problem than multicollinearity especially if your goal is true attribution.
π In summary:
KL Divergence is a great calibration metric. This has helped increase accuracy of our MMM models.
Video credit: Ari Seff (link in resources)
Resources:
Calibration is fast, Validation is slow :
https://www.linkedin.com/posts/venkat-raman-analytics_marketingmixmodeling-statistics-experimentation-activity-7167038099607289856-Ckej?utm_source=share&utm_medium=member_desktop
Why Aryma Labs does not rely on correlation alone :
https://open.substack.com/pub/arymalabs/p/why-aryma-labs-does-not-rely-on-correlation?r=2p7455&utm_campaign=post&utm_medium=web
Marketing Mix Modeling (MMM) Calibration Experiments :
https://www.linkedin.com/posts/venkat-raman-analytics_marketingmixmodeling-statistics-experimentation-activity-7163503064002277376-L0ZV?utm_source=share&utm_medium=member_desktop
Video credit : Ari Seff (Twitter) :
https://x.com/ari_seff/status/1303741288911638530?s=20