What is attribution modeling?

When a goal happens in soccer, which player should get how much credit of it?
- Should striker get the most credit for the goal?
- Should last person who pass the ball to the striker get the most credit for the goal?
- Should first person who initiated that rythm of passes get the most credit of the goal?
- Should every person who passes the ball get equal credit?
- Should the latter ones get more credit of the goal?
In marketing we are facing the similar kind of problem, that when the conversion happens. Then how much credit should be given to which channel in marketing?
Why we need attribution model?
In today's world every company is spending big amount of their budget on marketing
- To gauge the effectiveness of channels
- To measure impact of communication with customers
- To determine ROI
- To decide which actions to take
Types of attribution models:
- Rule based attribution model
- First Click Attribution Model
This is the closest proxy we have for "how did they hear about us in the first place?"
This is useful if you need to know which keywords genuinely helped make you known to the user.Pros: - Easy to implement
- Helps to know New Customer Acquisition Channels
- Insight into drive awareness campaigns
- No influence of subsequent touches
- Too much credit to lead gen programs
- Last Click Attribution Model
Most useful when the final touchpoint really was the deciding one, e.g. for impulse purchases or very price-sensitive decisions.
Pros: - Easy to implement
- Insights into drive conversion campaigns
- No influence of prior touches
- Too much credit to converting campaigns
- Equal weight Attribution Model
Gives equal weight to every touchpoint. Doesn't matter when that click is happening.
Pros: - No-fighting over who gets credit
- Helpful for longer revenue cycles with many clicks
- Low-impact touchpoints gets high credit
- No importance to high impact touchpoints
- Time Decay Attribution Model
“Latter you are in click-chain, more credit you will get”. If users can't remember about who showed up on the first few clicks, than those were probably worth less in final decision.
Pros: - Focused on all touchpoints
- Helpful for longer revenue cycles with many clicks
- Artificially inflate importance to latter channels
- Low-credit to acquisition channels
- Positional-Rule based Attribution Model
“What make you aware and What make you purchase from us?"
If you needed to get on a shortlist but you don't care if the user interacted with you again until they made their final decision, this model replicates that level of credit.Pros: - High credit for acquiring and converting channels
- Helpful for longer revenue cycles with many clicks
- Less influence for middle channels
- Data-driven attribution model: Second way of looking at attribution
- Helps to know proper attribution to each channel
- No set of rules to skew data
- Less guess work
- Better decisions
- Not Easy to implement
- Require statistical knowledge
- Require a lot of data to know statistical validity
- Shapely Attribution Model
How much value does each channel have?
Or in other words
How much value a channel brings to already existing list of channels?lets consider a customer journeys
In this customer journey, customer has 1% chances of making a purchase.
Now when we add display to the customer journey customer has 2% chances of making a purchase. Means there are 100% more chances of making a purchase when we add display as a channel to the current customer journey.
Let's consider we have 430 total conversions that happened thought the paid channels. Out of these 50 conversions had only social as the channel in the journey, 30 conversions had only Display as the channel in the journey, ..., 60 conversions had Social as well as Display as the channel in the journey, ..., 100 conversions had Social, Display as well as Retargeting as the channel in the journey.
Contribution of each channel is total time the channel appears in the customers journey. e.g. for Social it is
[50(where social appears as only channel in customers purchase journey] +
[60(where social, display occur together) - 30(where display occur only)]
+ .... +
[100(where social, display, retargeting occur together) - 80(where display and retargeting occur only)]
Weightage of each channel is contribution of that channel divided by total contribution of all the channels.
The final contribution of channels is different from all the rule based models.
- Markov-chain Attribution Model
Probabilistic Model
Relies on current state to predict the next statelets consider 3 customer journeys
Journey 1: Customer comes from Social channel, then after a few days customer comes from Display channel, then from Retargeting and finally makes a purchase.
Journey 2: Customer comes from Display channel, then after a few days customer comes from Retargeting channel but never makes a purchase.
Journey 3: Customer comes from Display channel only and never makes a purchase.
So, we have 3 customer journey's as follows:
Since Markov Model is the state change model. It is based on the current as well as previous stage only. So, we need to transpose the data and make the probability of the customer going from the initial stage to the next stage.
Once we have done the transformation of data, we get this kind of chart for all the channel in the customer journey:
Total chances of making a purchase is (33% * 100% * 66% * 50%) + (66% * 66% * 50%) = 33%
what will be the impact if we remove social channel from the customers journey is
(33% * 100% * 66% * 50%) +(66% * 66% * 50%) = 22%Effect of removing social media is 22%/33% = 0.66
Similarly, finding the impact of removing other channels.
Effect of removing:
- Social Media: 22.2% / 33.3% = 0.66
- Display: 1
- Retargeting: 1
Weight% of the effect of removing on total conversion, is the effect of removing the channel divided by the effect of removing all channels multiplied by the total conversions that we have in the journey, which is one in this case. This can be calculated as:
- Social Media: 0.66 / (0.66 + 1 + 1) = 0.25 * conversion (1) = 25%
- Display: 1 / (0.66 + 1 + 1) = 0.375 * conversion (1) = 37.5%
- Retargeting : 1 / (0.66 + 1 + 1) = 0.375 * conversion (1) = 37.5%
The final contribution of channels is different from all the rule based models.
-
When to choose which data-driven attribution model
Shapley wins over Markov chain in:- Much broader industry adoption
- Used successfully in attribution and auto-bidding platforms for years
- Backed by Nobel Prize winning research
- Slightly more straightforward approach to the attribution problem in which sequence doesn’t matter
- Easier to implement
- Results are usually more stable
- Results are less sensitive to the input data
Markov chain outshines Shapley value in:- Considers channel sequence as a fundamental part of the algorithm which is more closely aligned to a customer’s journey
- Potential to scale to a more considerable number of channels
- Marginal contributions for Shapley value must be calculated 2^n times (n being the number of marketing channels)
- Can be applied to the individual marketing campaign level which is more actionable for personalization
"Proper distribution to each channel"
![]() |
Pros:
|
---|
We will be properly able to attribute the revenue to different channels for revenue for new customer acquisition.