In my previous posts, I had written about why one should not use Geo tests to fix priors in Bayesian MMM. ICYMI the link is in resources.
Another reason why this is such a bad idea is because of the difference in nature of MMM model and Experimentation.
What nature?
Well, MMMs are long memory models while Experimentation often have no memory or relatively short memory.
Let me unpack what I mean by memory (this going to slightly stat/math heavy, but pls bear with me).
MMMs are inherently autoregressive in nature. Autoregressive means the present value in some form depends on the past value/s.
Autoregressive (AR) Models forecast a time series based on its past values (Lags). The forecast variable is depicted as a linear combination of these lags.
If Yโ depends only on the immediate past value Yโโโ in the series, then it is depicted as below:
๐โ = ๐ + ษธ๐โโโ + ษโ (๐๐ช1)
This model is called AR (1) model. (where 1 denotes number of AR term and ษโ denotes noise/error)
If Yโ is dependent on multiple lag values of itself, it can be depicted as below:
Yโ = c + ษธโYโโโ + ษธโYโโโ + โฆ ษธโYโโโ + ษโ
This model is referred to as AR(p) model.
So, why AR models are known as Long Memory Models?
If Yโ depends only on the immediate previous value of the series, then it is depicted as Eq1
Similarly, as we go back one step in time, the equation is of the form:
Yโโโ = c + ษธYโโโ + ษโโโ (Eq2)
If we substitute Eq2 in Eq1, we get:
Yt = c + ษธ(c + ษธYโโโ + ษโโโ) + ษโ
โ Yt = c + cษธ + ษธยฒYโโโ + ษธ ษโโโ + ษโ (Eq3)
Similarly, Yโโโ can be substituted with a past value and an error term in Eq3.
In this recursive fashion, Yโ ultimately depends on the first value in the series Yโ, hence they are known as Long memory models.
๐ So how does this make MMM long memory model?
One of the key component of MMM model is adstock. (If you need a refresher on adstock – check the link in resources).
A simple adstock (decay effect) model is written as:
๐๐ญ = ๐๐ญ + ฮป๐๐ญ-1
Where At is the Adstock at time t, Tt is the value of the advertising variable at time t and ฮป is the โdecayโ or lag weight parameter. Inclusion of the At-1 term, imparts an infinite lag structure to this model.
One can see how this equation is similar to Eq1.
MMMs hence are long memory models.
๐ So why fixing priors through Experimentation is a bad idea?
As you have seen above, there is a long memory in MMM. The present value of sales is simply not detached from its past values or past advertising impact.
Experiments on the other hand fracture adstock.
They only have limited view (or no view at all) of the saturation and decay of the media.
While in reality, to truly account for the effect of the media on the KPI, one should incorporate this carryover effect.
MMM does that. But Experiments don’t.
Resources:
Why you shouldn’t use Geo tests to fix priors in Bayesian MMM.
https://www.linkedin.com/posts/venkat-raman-analytics_marketingmixmodeling-statistics-marketingattribution-activity-7152885714853023746-3C9S?utm_source=share&utm_medium=member_desktop
What is Adstock:
https://arymalabs.com/wp-content/uploads/2023/09/Marketing-Mix-Modeling-101-Adstock.pdf
Understanding Adstock (The adstock equation in post is taken from this paper
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=924128)