Last modified: 2018-07-07
Abstract
Numerical data types are divided into two, namely continuous data and discrete data. One type of discrete data is data amount or often also called data count. Data count is a non-negative valued data that states the number of occurrences in a given time interval, space, or volume. The common distribution used to describe the distribution of data count is Poisson distribution. Poisson Distribution is a distribution that states the number of occurrences at intervals of time or region (Cameron & Trivedi, 1998). The Poisson distribution has the same value characteristic as the range value. This condition is called the condition of equidispersion. In terms of the distribution parameter estimation, non-fulfillment of these conditions will result in estimated parameters obtained from the Poisson distribution becomes inefficient, though consistent, and will produce estimates that are not appropriate (Joseph M Hilbe, 2011).
The characteristics of equidispersion rarely occur in applied data, or frequent irregularities. There are two kinds of equidispersion deviation, those are underdispersion and overdispersion. In applied data, the condition that often happens is the value of the variety is greater than the average value or overdispersion. One of the causes of overdispersion is that data is dominated by observations of zero or often called excess zeros.
Lambert (1992) in his paper on the application of the Zero-Inflated Poisson regression model has discussed a distribution that is a modification of the Poisson distribution, the Zero-Inflated Poisson (ZIP) distribution. This distribution is claimed to be able to describe the distribution of data count which is overdispersed caused by excess zeros. Previously, Cohen (1963) and Johnson & Kotz (1969) have discussed the ZIP regression model without covariates.
The ZIP model parameters are specifically estimated using the maximum likelihood method, and according to Schwartz (2013) more accurate conclusions will be obtained if the sample size is relatively large. The result of parameter estimation with maximum likelihood method has not been analyzed for relatively small sample size. This paper will discuss about parameter estimation on ZIP distribution using maximum likelihood method at relatively small sample size.
In fact, parameter estimation using the maximum likelihood method on small sample sizes yields a relatively small bias. However, such bias can be virtually eliminated when using biased-reduced analytical bias by Cox and Snell (1963). This method can reduce the bias without sacrificing the mean squared error value of the maximum likelihood estimator (Schwartz, 2013).
Cordeiro and McCullagh (1991) have also conducted research on the correction of bias in generalized linear models (GLM). They obtained the formula for the first order bias in the maximum likelihood estimator of linear parameters, linear predictors, disperse parameters, and fitted values ​​in GLM. These formulas can be used to calculate the bias-corrected maximum likelihood estimator.
We will also compare the biased-reduced method with another method, i.e. the method of bias correction with the parametric bootstrap resampling method (Efron, 1977). The bootstrap method can reduce the bias without sacrificing the mean squared error value of the maximum likelihood estimator (Schwartz, 2013). Schwartz (2013) also suggests that the double bootstrap resampling method (Martin, 1990) can also reduce bias without sacrificing the mean squared error value of the maximum likelihood estimator. This paper will implement both resampling methods and compare the results of both with the bias-reduced estimator results.
We obtain the result by conducting a simulation. The data simulation aims to show that the ZIP distribution parameter estimation using the bias-reduced maximum likelihood method is relatively small according to percent bias and percent mean-squared error (MSE) compared to the maximum likelihood, bootstrap bias reduction, and double bootstrap bias reduction. The parameter estimation values of the maximum likelihood and bias-reduced maximum likelihood methods were obtained from the Monte Carlo experiments.
From the result we can conclude that bias-reduced maximum likelihood estimation compared favorably to parametric bootstrap and double bootstrap bias reduction. This method is also shown its domination against original maximum likelihood estimation in small samples.
Keywords: Zero-Inflated Poisson; Bias Reduction; Bootstrap; Maximum Likelihood Estimation
Acknowledgment: We wish to thank DRPM Universitas Indonesia through Hibah Publikasi Internasional Terindeks untuk Tugas Akhir Mahasiswa UI (PITTA) 2018.
Â