Few months back we published our whitepaper ‘Granger Causality – A possible Feature Selection Method in MMM’. ICYMI the link is in the resources section.
In this blog I want to highlight another useful feature selection method – Transfer Entropy.
As you must have guessed, the method is a information theoretic method. In case you would like a good refresher on topics of entropy and information gain, I would highly recommend the article ‘Entropy Demystified by Naoki’ (link in resources).
So coming back to Transfer Entropy.
Transfer Entropy is a non parametric method to measure the amount of directed (time-asymmetric) transfer of information between two random process.
Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y.
๐ Our Hypothesis:
We believe that if there is a significant information flow between the independent variable (say marketing channel spend) with respect to the KPI (sales), then that independent variables becomes ideal feature that could be used for MMM model fitting.
๐ What Transfer Entropy provides over Granger Causality.
If you read our last paper, you will notice that the operating mechanism of both GS and Transfer Entropy is almost similar. This begs the question, what Transfer Entropy provides in addition? Well the answer is that, Transfer Entropy is not constrained to only linear relationships like Granger Causality.
๐ The experiment:
We utilized the R package ‘RTransferEntropy’ to get the Transfer Entropy values. We used the same dataset used for the Granger Causality paper.
๐ How to interpret the results:
TE (X->Y)= 0.13 means that the history of the X process has 0.13 bits of additional information for predicting the next value of Y. (i.e., it provides information about the future of Y, in addition to what we know from the history of Y).
๐ The results:
We found that the variables that we had chosen had significant information flow between them and the KPI. In a way this vindicates our feature selection.
One interesting thing to notice is that there is a significant overlap between the granger causal variables and the significant TE variables.
๐ Caveats:
We prescribe this method only as a supplement to existing ways of feature selection. This method could also be used to validate the features selected through other means.
Overall we suggest using this method in combination to domain driven feature selection methods to arrive at the most relevant features for the MMM model.
Resources:
- Using Granger Causality for Feature Selection – https://www.linkedin.com/posts/ridhima-kumar7_granger-causality-a-possible-feature-selection-activity-7117840404351250433-TrhB?utm_source=share&utm_medium=member_desktop
- Entropy Explained – https://naokishibuya.medium.com/demystifying-entropy-f2c3221e2550
- RTransferEntropy – https://cran.r-project.org/web/packages/RTransferEntropy/vignettes/transfer-entropy.html
- TE Paper – https://www.sciencedirect.com/science/article/pii/S2352711019300779