Last modified: 2018-07-07
Abstract
Count data is numerical data which contains non-negative integers. Count data are usually the outcomes of an underlying count process in continuous time (Asamoah, 2016). Count process is stochastic process {N(t), t>0} in which N(t)’s are non-decreasing and represents the total number of events that occur by time t, and thus the values are non-negative integers (Ross, 2010). One of the distributions often used to fit count data is Poisson count model. It is quite simple because its process satisfies the characteristics of count process and it is a distribution with a discrete random variable. Poisson count model has exponential interarrival times.
Â
However, Poisson count model is only valid if the data satisfy equidispersion assumption (the variance of the data equals its mean). Applying Poisson count model to the significantly non-equidispersed data could lead to misspesification of the distribution of the data (Asamoah, 2016). In that case, the data could either be overdispersed (the variance is more than its mean) or underdispersed (the variance is less than its mean). Many models that can handle overdispersion have been developed e.g. negative binomial model (NBD) by Greenwood and Yule (1920). Many statisticians have also tried to provide ways to manage underdispersion, but not many offer the conceptual elegance and usefulness of the Poisson-exponential connection (McShane et al., 2008).
Â
Exponential interarrival times implies constant hazard function. Winkelmann (1995) have developed count model based on gamma interarrival times which hazard is not constant. However, hazard function of gamma distribution is not closed-form. Winkelmann (1995) also stated that Weibull distribution is preferred in duration analysis because its hazard is closed-form. Another characteristic of Weibull distribution is that its shape parameter (c) could vary.
Â
(this figure could be seen in the supplementary file)
Figure 1. Graph of probability density function of Weibull distribution with different c.
McShane et al. (2008) stated that Weibull interarrival times could handle overdispersed data with 0<c<1, and underdispersed data with c>1. Moreover, when c=1, as seen in figure 1, Weibull interarrival time is reduced as exponential it can handle equidispersion as well. Weibull count model will be obtained by Taylor expansion and convolution method. It will then be extended to heterogenous Weibull count model by mixing.
Â
Afterwards, we will predict number of claims of each policyholder, assuming that it follows heterogenous Weibull count distribution. Nevertheless, policyholder's claim experience is in most cases too limited to be given full credibility in predicting claim frequency. But still, policyholder's risk is usually a part of a large risk class which collective claim experience can provide information for credible statistical prediction. Both information should be considered in predicting policyholder's claim frequency. One of the approaches is credibility. Credibility assigns weights to individual experience estimation and collective experience estimation as well to obtain the prediction of a policyholder’s claim frequency.
Â
Buhlmann credibility model is the simplest credibility model because it assumes that the policyholder’s claim experience contains independent and identically distributed components. It implies that the exposure or size is assumed same for all past year j. However, it is not convenient to fulfill such assumption. Therefore, this paper will explain an approach by Buhlmann-Straub (1970), that eliminates the restriction in Buhlmann model by allowing process variance of claim frequency to depend on exposure, to predict claim frequency.
Â
The final result of this research are (1) heterogenous Weibull count model that could handle all kinds of dataset dispersion, (2) Buhlmann-Straub credibility model to predict a policyholder’s claim frequency that follows heterogenous Weibull count distribution. Furthermore, we will apply the Buhlmann-Straub model to a specific data but we are still in progress in aquiring the data.