Modeling iPad Sales with the Bass Model in R

The Bass model is a famous tool which helps us describe diffusion of new products in the market. It was first presented in 1969 paper by Frank M. Bass of Purdue University. I wanted to take on this model and use it on some data to see how well it performs.

The Theory

What is it Good For

The original paper describing the tool is entitled “A New Product Growth For Model Consumer Durables”  and durables are indeed what it is aiming to predict. Moreover, the model works best when dealing with new product types. It will not work as good when predicting sales of a new generation of established product (Think the new, improved iPhone) because the assumptions underlying the dynamics of innovation and imitation of consumer behavior will not hold entirely in such cases. Interestingly, the model describes well most product introductions no matter what managerial decisions are made on dimensions such as advertising and pricing. Although extensions of the model exist to better factor in those variables. Another application of the model besides new products is adoption of new technologies.

The model’s is particularly useful because predicting sales of entirely new products is challenging due to the lack of historical data inadvertently accompanying such circumstances.

Basic Logic

The underpinning logic of the model differentiates two consumer behaviors: Innovation and Imitation. This is a simplification of the well-known curve Diffusion of Innovation:

diffusion-of-innovation-adoption-curve

Innovators are not influenced by the social pressures, actually, they are the ones creating them. The more customers make the initial purchase, the less likely are they likely to buy.

Imitators play it safe. Their probability of purchase grows the more people already own the product. Imitators can be seen as a composite of the late three categories of the diffusion of innovation curve.

The behavior of the two groups is d isplayed on the figure below:

the-bass-model-number-of-new-adopters

Please take note that this curve represents new purchases, not cumulative sales or the adoption. This would be captured by cumulative distribution function of the curve which yields logistic curve in the case of the Diffusion of Innovation figure above or something similar in the case of the general bass model.

Model Prescription

Here is the model prescription:

screen-shot-2016-09-10-at-12-07-56

S(t) is the sales in time t.

Another way of looking at the prescription is that we are multiplying the size of the potential market m by adoption at time t.

There are three parameters to the model:

  • m = the market potential; the total number of people who will eventually use the product
  • p= the coefficient of innovation: the likelihood that somebody will buy the product purely based on media coverage or marketing without relying on references from peers. The usual value using yearly time scale is usually less than 0.03, as low as 0.01.
  • q= the coefficient of imitation: The probability that somebody will start using the product based on references from their peers The usual value is between 0.3 and 0.5. Commonly 0.38.

iPad Sales Case Study

I wanted to test this with some data and test this in R. I downloaded quarterly sales figures for the iPad and copied them over. I will first fit a bass model to iPad sales data, then visualize the projected versus real data.

Model Creation

#in millions units
units_sold = c(3.27,4.19,7.33,4.69,9.25,11.12,15.43,11.8,17.04,14.04,22.86,19.48,14.62,14.08)

#let's name them just for convinience
names(units_sold) <- c("Q32010", "Q42010","Q12011","Q22011","Q32011", "Q42011","Q12012","Q22012","Q32012","Q42012", "Q12013","Q22013","Q32013","Q32013")

# create vector of dates:
dates <- 1:length(units_sold)
For finding the parameters, we will use the non-linear least square function which we will call with the formula of the model and ask it to optimize for the parameters m,p, and q using our sales data. We will initialize the optimization procedure with the typical values mentioned above. Even though they are more common for yearly, not quarterly values, they seem to work relatively OK.
Bass.nls <- nls(units_sold ~ M * (((P + Q)^2/P) * exp(-(P + Q) * dates))/(1 + (Q/P) * exp(-(P + Q) * dates))^2, start = list(M = 500, P = 0.03, Q = 0.38))

And extract the found values; we will use them later.

m <-coef(Bass.nls)[1] 
p <-coef(Bass.nls)[2]
q <-coef(Bass.nls)[3]

What did we get? Let’s call summary on the Bass.nls object:

> summary(Bass.nls)
 Formula: units_sold ~ M * (((P + Q)^2/P) * exp(-(P + Q) * dates))/(1 + (Q/P) * 
 exp(-(P + Q) * dates))^2

Parameters:
 Estimate Std. Error t value Pr(>|t|) 
M 2.292e+02 2.506e+01 9.147 1.79e-06 ***
P 1.030e-02 2.494e-03 4.128 0.00168 ** 
Q 2.974e-01 4.527e-02 6.569 4.03e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.499 on 11 degrees of freedom

Number of iterations to convergence: 8 
Achieved convergence tolerance: 3.013e-06

So the model estimated:

  • m = market potential for iPads as roughly 230 Million devices
  • p = likelihood that someone is going to buy one as an innovator (ie. without word of mouth influence) is about 1%
  • q = probability of purchase based on reference is close to 30 %

Contrasting Model and Actual data

Having developed the model, I would like to calculate and visualize the performance of the model’s prediction versus the actual data.

For that purpose we will first need a vector of times to predict for:

modeltimes <- seq(1,14)
myForecast <- m * (((p +q)^2/p) * exp(-(p + q) * modeltimes))/(1 + (q/p) * exp(-(p + q) * modeltimes))^2

And plot it:

plot(myForecast, type="l", xlab="quarters",ylab="sales in million units", main="iPad sales model vs actual")
points(dates,units_sold)

rplot

It works! 😀

Looking Into the “Future”

But what if we wanted to look into the future a bit more? Of course, we have data for even beyond 2013 :-) but for the sake of the exercise, let’

s go back in time!. This is what the model would show us about the future.

modeltimes2 <- seq(1,35)
myForecast2 <- m * (((p +q)^2/p) * exp(-(p + q) * modeltimes2))/(1 + (q/p) * exp(-(p + q) * modeltimes2))^2


plot(myForecast2, type="l", xlab="quarters",ylab="sales in million units", main="iPad sales model vs actual")
points(dates,units_sold)

rplot02

One more interesting question may be when does the model assume the market will be saturated:

The cumulative distribution function converges to 1 in about 15 quarters, which is Q1 2017…. right about now.

plot(ecdf(myForecast2))

rplot03

Of course iPads will continue to be sold thanks to marketing, product line extensions and sustaining innovation of the current models.