The relationship between Curse of Dimensionality and Degrees of Freedom

Statisticians/Data scientists often remark “Don’t add more variables, you won’t have enough degrees of freedom”

What does that mean ?

Lets take an example of Multi linear regression

When building a multi linear regression model, adding more independent variables into the model reduces the degrees of freedom.

This is also related to the “curse of dimensionality”. Adding more variables is equal to adding more dimensions.

Your next question might be – Why is it a curse?๐Ÿ˜…

Well, the problem is, while we add more dimensions, the data we have at hand remains the same. So the data quickly becomes sparse at higher dimensions.

See in the image below, how the same number of data points in 2D looks dense but in 3D it starts to look very sparse.

One can imagine the sparsity in higher dimensions !!

Curse of dimensionality

 

 

 

 

We need more data to fill up the space, but unfortunately the data at hand is limited.

Adding more variables causes two more things:

1. Loss of degrees of freedom.
2. Highly inflated R squared value.

So now you know why statisticians are apprehensive about adding more features into the model ๐Ÿ˜….

An intuitive explanation of Degrees of Freedom is provided in the resources section.

Resources:

Degrees of Freedom Explained: https://www.linkedin.com/posts/aryma-labs_datascience-machinelearning-statistics-activity-6823888028017266688-4xML?utm_source=share&utm_medium=member_desktop

Facebook
Twitter
LinkedIn

Recommended Posts

Chebyshev’s Inequality for Marketing Mix Model Diagnostics

Chebyshev’s Inequality for Marketing…

At Aryma Labs, we constantly endeavor to add as much science as possible…

How to use Robyn’s…

In my last post (ICYMI link in resources), I talked about the similarities…

Similarities between Decomp RSSD and Bayesian Priors in Marketing Mix Modeling (MMM)

Similarities between Decomp RSSD…

Open source Marketing Mix Modeling (MMM) tools are great for democratizing MMM. But…

Scroll to Top