Blogs

How Marketing Mix Modeling (MMM) can help you learn Linear Regression from first Principles.

How Marketing Mix Modeling (MMM) can help you learn Linear Regression from first Principles.

How Marketing Mix Modeling (MMM) helped me learn Linear Regression from first Principles. Some of you have appreciated my posts on Linear Regression and other statistics topics. Some of you also often ask me resources to learn Linear Regression. I always provide a list of books or articles that I have personally read. But when I look back, there is one thing that helped me learn the concept of Linear Regression from first principles. And it wasn’t just the books or articles. It was practice. My understanding of Linear Regression really deepened because of Marketing Mix Modeling. 8 years ago, Ridhima Kumar taught me the nuances of MMM. I will

Read More »
Why we report both confidence Interval and Prediction Interval in our MMM models

Why we report both confidence Interval and Prediction Interval in our MMM models

Why we report both confidence Interval and Prediction Interval in our MMM models. MMM is a type of linear regression but with lot more bells and whistles (check the link under resources for a primer on MMM). If you must have noticed in Linear Regression, the confidence interval are always narrower than the prediction interval. You see in Linear Regression, we don’t model the raw response, rather we model the conditional mean (expected value) -> E(Y|X). This conditional mean is the parameter which we try to estimate. Through CI, we are trying to answer the question – “If we were to do this Linear Regression again and again (in a

Read More »
Selecting MMM models via AIC? Some key pointers

Selecting MMM models via AIC? Some key pointers

Selecting MMM models via AIC? Some key pointers 👇 The Akaike information criterion (AIC) is given by: AIC = 2k -2ln(L) where k is the number of parameters L is the likelihood The underlying principle behind usage of AIC is the ‘Information Theory’. Talking about information theory, we have been researching and implementing these concepts innovatively in MMM. (Check the link to our whitepapers under resources). Coming back to AIC, In the above equation we have the likelihood. We try to maximize the likelihood. It turns out that, maximizing the likelihood is equivalent of minimizing the KL Divergence. 📌 But what is KL Divergence? From an information theory point of

Read More »
Calibration vs Validation in MMM

Calibration vs Validation in MMM

Calibration vs Validation A lot of people use calibration and Validation interchangeably. The two are not the same. ▪ Calibration In a regression setting, calibration of a model is about understanding the model fit. Goodness of fit measures like R squared values, P value, Standard Error, Cross validation and within sample MAPE/MAE/RMSE inform you how well you have fit the model. Based on these metrics, one could tune and calibrate the model better. ▪ Validation Validation of a model is in a way a test of your model fit. One can normally validate their model through the following ways: – Incrementality testing – Geo-lift – Causal experiments – Out of

Read More »

The relationship between Curse of Dimensionality and Degrees of Freedom

Statisticians/Data scientists often remark “Don’t add more variables, you won’t have enough degrees of freedom” What does that mean ? Lets take an example of Multi linear regression When building a multi linear regression model, adding more independent variables into the model reduces the degrees of freedom. This is also related to the “curse of dimensionality”. Adding more variables is equal to adding more dimensions. Your next question might be – Why is it a curse?😅 Well, the problem is, while we add more dimensions, the data we have at hand remains the same. So the data quickly becomes sparse at higher dimensions. See in the image below, how the

Read More »

Why Linear Regression is not all about predictions

Often I come across posts and comments from people where they make claims like ‘Linear regression is all about predictions’. Well they are wrong but I don’t quite blame them. Thanks to the machine learning take over of statistical nomenclatures, any prediction task is now labelled as ‘Regression task’ !! This is of course two distinguish from the classification tasks. This peculiar nomenclature borne out of poor understanding is why Logistic regression is wrongly considered as a classification algorithm. Both Adrian Olszewski and I have written extensively about why logistic regression is regression. And thanks largely to undeterred efforts of Adrian, even scikit learn changed its documentation to reflect the

Read More »

Bayesian MMM is not a silver bullet for MMM’s Multicollinearity issue

I came across a post few days back which stated that Bayesian Methodologies are better at handling Multicollinearity in MMM. This is simply not true. Multicollinearity is a information redundancy problem and Bayesian methodology can’t magically solve it. Rather the problem becomes worse in case of Bayesian MMM because your posterior distribution keeps getting wide as you have more multicollinearity. What this means in layman terms? Lets take an analogy: Imagine you are on a search party to track down your pet dog that got lost in woods. You can hear your dog barking and you set up a search radius of say 1km. But then you also hear another

Read More »

Why was ANOVA invented ?

I have interviewed many statisticians over the years. One misconception many still have is that, “If we have more than 2 groups, then we can’t perform t-test and that is why we perform ANOVA”. It is not that we can’t perform t-test with more than 2 groups. We certainly can. The real reason why ANOVA came to be is because, doing a string of t-tests would lead to two things: 1) Multiple comparison issue (multiplicity). 2) Error inflation. Remember that, each time one would conduct a t-test, there exists a possibility of making a type 1 error. As you conduct many such individual t-tests, you are simply compounding the type

Read More »

Explaining the ‘Hourglass’ shape of Confidence Interval

Couple of weeks back I wrote a post on “Why we report both confidence Interval and Prediction Interval in our MMM models.” If one were to notice the shape of the confidence Interval, one would notice that it is in the shape of ‘hourglass’ or ‘sand clock’. Now why is that? Well, the answer again boils down to what Regression does and what Confidence Interval calculates. 💡In Linear Regression, we we don’t model the raw response, rather we model the conditional mean (expected value) -> E(Y|X). 💡In Linear Regression, the line or plane passes through (x̄,y̅) 💡Secondly, in confidence Interval we try to see whether the conditional mean (the parameter)

Read More »

What the word ‘confidence’ in Confidence Interval Signifies

A lot of people switch to Bayesian methods not because it is better than Frequentist ones, but mainly because they find it hard to wrap their heads around Frequentist concepts. One such concept is Confidence Interval. One of the common misconception people have wrt to CI is that “it is the range in which the probability of finding the parameter of interest is 90% or 95%”. Many conflate this to ‘confidence’ of finding the parameter with a certain probability. But this is wrong. Lets first look at the correct definition of CI. Example : 95% CI. If one ran the same statistical test taking different samples and constructed a confidence

Read More »
Scroll to Top