MMMs are long memory models

In my previous posts, I had written about why one should not use Geo tests to fix priors in Bayesian MMM. ICYMI the link is in resources.

Another reason why this is such a bad idea is because of the difference in nature of MMM model and Experimentation.

What nature?

Well, MMMs are long memory models while Experimentation often have no memory or relatively short memory.

Let me unpack what I mean by memory (this going to slightly stat/math heavy, but pls bear with me).

MMMs are inherently autoregressive in nature. Autoregressive means the present value in some form depends on the past value/s.

Autoregressive (AR) Models forecast a time series based on its past values (Lags). The forecast variable is depicted as a linear combination of these lags.

If Yₜ depends only on the immediate past value Yₜ₋₁ in the series, then it is depicted as below:

𝐘ₜ = 𝐜 + ɸ𝐘ₜ₋₁ + ɛₜ (𝐄𝐪1)

This model is called AR (1) model. (where 1 denotes number of AR term and ɛₜ denotes noise/error)

If Yₜ is dependent on multiple lag values of itself, it can be depicted as below:

Yₜ = c + ɸ₁Yₜ₋₁ + ɸ₂Yₜ₋₂ + … ɸₚYₜ₋ₚ + ɛₜ
This model is referred to as AR(p) model.

So, why AR models are known as Long Memory Models?

If Yₜ depends only on the immediate previous value of the series, then it is depicted as Eq1

Similarly, as we go back one step in time, the equation is of the form:
Yₜ₋₁ = c + ɸYₜ₋₂ + ɛₜ₋₁ (Eq2)

If we substitute Eq2 in Eq1, we get:

Yt = c + ɸ(c + ɸYₜ₋₂ + ɛₜ₋₁) + ɛₜ
⇒ Yt = c + cɸ + ɸ²Yₜ₋₂ + ɸ ɛₜ₋₁ + ɛₜ (Eq3)

Similarly, Yₜ₋₂ can be substituted with a past value and an error term in Eq3.

In this recursive fashion, Yₜ ultimately depends on the first value in the series Y₁, hence they are known as Long memory models.

📌 So how does this make MMM long memory model?

One of the key component of MMM model is adstock. (If you need a refresher on adstock – check the link in resources).

A simple adstock (decay effect) model is written as:

𝐀𝐭 = 𝐓𝐭 + λ𝐀𝐭-1

Where At is the Adstock at time t, Tt is the value of the advertising variable at time t and λ is the ‘decay’ or lag weight parameter. Inclusion of the At-1 term, imparts an infinite lag structure to this model.

One can see how this equation is similar to Eq1.

MMMs hence are long memory models.

📌 So why fixing priors through Experimentation is a bad idea?

As you have seen above, there is a long memory in MMM. The present value of sales is simply not detached from its past values or past advertising impact.

Experiments on the other hand fracture adstock.

They only have limited view (or no view at all) of the saturation and decay of the media.

While in reality, to truly account for the effect of the media on the KPI, one should incorporate this carryover effect.

MMM does that. But Experiments don’t.

Resources:

Why you shouldn’t use Geo tests to fix priors in Bayesian MMM.
https://www.linkedin.com/posts/venkat-raman-analytics_marketingmixmodeling-statistics-marketingattribution-activity-7152885714853023746-3C9S?utm_source=share&utm_medium=member_desktop

What is Adstock:
https://arymalabs.com/wp-content/uploads/2023/09/Marketing-Mix-Modeling-101-Adstock.pdf

Understanding Adstock (The adstock equation in post is taken from this paper
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=924128)

MMMs are long memory models

Recommended Posts

Chebyshev’s Inequality for Marketing…

How to use Robyn’s…

Similarities between Decomp RSSD…

Quick Links

Contact Us

Address

Careers