Open Conference Systems, MISEIC 2019

Font Size: 
Multivariable Semiparametric Regression with Combined Estimator of Truncated Spline and Fourier Series (Case Study: Mean Years Schooling of Regency in Java)
Helida Nurcahayani, I Nyoman Budiantara, Ismaini Zain

Last modified: 2019-10-13

Abstract


Multivariable Semiparametric Regression with Combined Estimator of Truncated Spline and Fourier Series (Case Study: Mean Years Schooling of Regency in Java)

Helida Nurcahayani1*, I Nyoman Budiantara2 and Ismaini Zain3

 

1 PhD Student of Statistics Department, Faculty of Mathematics, Computation And Data Science, Sepuluh Nopember Institute of Technology, Indonesia

(E-mail: helida.18062@mhs.its.ac.id)

2,3 Department of Statistics, Faculty of Mathematics, Computation And Data Science, Sepuluh Nopember Institute of Technology, Indonesia.

(E-mail:nyomanbudiantara65@gmail.com, ismaini_z@statistika.its.ac.id)

ABSTRACT

Education is widely regarded as fundamental resource, for both individuals and societies. Mean Years Schooling (MYS) is one of the two education indicators in UNDP’s Human Development Indices (HDI) and can reflect important aspects related to the degree of education in some countries or regencies. The result of preliminary identification between MYS’s data as response and each predictors variables shows relationship pattern that changes at certain sub-intervals, meanwhile some follow patterns repeating at certain intervals and also consist of parametric and nonparametric component. Thus the suitable method used is the semiparametric regression with combined estimator of spline truncated and fourier series. In this regard, the objective of this research is to obtain an estimator of a semiparametric regression model with combined estimator spline truncated and fourier series by applying to the data of MYS of 119 regencies in Java.

 

In this study, the model applied MYS in 119 regencies across Java (y), and some predictor variables that affect to MYS, i.e., student-school ratio (z1), percentage of government expenditure on education (z2), percentege of poor people (x), per capita population expenditure (t1), and population density (t2). The steps used in this study started with determining the relationship between response and each predictor variable along with determining the predictor that is approached by linear, truncated spline, and fourier series function. The best model is obtained based on the optimum knot points, bandwidth, and oscillation parameter where the method of selecting uses minimum Generalized Cross Validation (GCV) criteria. Next, the model parameters are estimated with Penalized Least Square (PLS) method and it was found the coefficient determination (R2) on the model. The final step is interpreting the model and drawing conclusions.

The estimator of model was obtained through the PLS optimization as follows.

and the result is estimator of combined spline truncated and fourier series in multivariable semiparametric regression that formulated below

In this study the numbers of knot points and oscillation used are one, two, and three knot points. The best model is obtained with three knots-two oscillation and with that minimum GCV equals to 0.355 and R2=88.34%. Furthermore, parameters model are estimated with PLS method and the semiparametric regression model with combined estimator spline truncated and fourier series on the data of MYS in 119 regencies across Java is written as follows

After obtaining the model, it can be compared with actual and predictive data as shown in Figure 1 where the graph between actual and predictive data is quite close for several regencies.

Figure 1. Comparison of Actual and Predicted Data

 

Keywords: fourier series, generalized cross validation, mean years schooling, penalized least square, semiparametric regression, spline truncated


Keywords


fourier series; generalized cross validation; mean years schooling; penalized least square; semiparametric regression; spline truncated