The Combination of Multinomial Logistic Regression and Spline Regression for Credit Risk Modeling

Muhammad Rizky Adha; Siti Nurrohmah; Sarini Abdullah

Open Conference Systems, MISEIC 2018

Muhammad Rizky Adha, Siti Nurrohmah, Sarini Abdullah

Last modified: 2018-07-07

Abstract

In the last 50 years, consumer credit has been the driving force of the economy in most developed countries (Sirong, Xiao, & Tingting, 2016). As developing country, Indonesia also having an enhancement in the number of consumer credit. In 2018, Bank Indonesia predicts credit growth will go up to 10-12 percent. In line with the figure, Indonesia's economic growth is in the range of 5.1-5.5 percent (cnnindonesia.com, 2017). Therefore, the magnitude of credit growth will affect the magnitude of economic growth. The amount of credit disbursed certainly affect the bank's performance, which will also affect the economic sector in a country.

As the growing number of consumer credit, it causes fierce competition among credit providers. In which, profitability has become one of the important business considerations in credit decision making. Based on the consideration of profitability, credit providers require a model that can be used as reference for credit decision making. However, not all customer credit run into default or attrition within a period of observation. Conditions in which the data isn't observed within a certain time period called censored data.

Survival modeling has been adapted in retail banking because of its capability to analyze the censored data. It is an important tool for credit risk scoring, stress testing and credit asset evaluation. The survival data is a non-negative random variable because it consists of survival time of a set of observation objects. Survival data can be either complete (uncensored) or incomplete (censored) data. But in fact, to observe the survival time of all observation objects takes time and cost are not small so rarely done so that the data survival is usually censored data. Censored data is data that cannot be observed as a whole because the object of observation is lost so it is not known the actual time of the incident, the object of observation has an occurrence outside which the researcher noticed, or until the end of the research object has not experienced an event of note (Klein & Moeschberger, 1997). The censored data consisted of right censored data, left censored data and interval censored data. The right censored data consists of two types, namely type I and type II censored data. The right censored type I data is the time-sequence data where the observation will be stopped after reaching the predetermined censor k time to end the observation. The right censored type II data is the time-sequence data where the observation is stopped after the first object r obtained by the occurrence of n the observed object with 1 < r < n. (Klein & Moeschberger, 1997).

The approach thatâ€™s used to solve this problem is by using discrete time survival models, i.e., multinomial logistic regression model. Logistic regression (sometimes called logistic model or logit model) is used to predict the probability of events to occurring. Multinomial logistic regression model is the development of a binomial logistic regression model in which the dependent variable has more than two categories (polychotomous). One of assumption that must be fullfiled in the multinomial logistic regression model is the dependent variable must be random variable which independent and its category is mutually exclusive (i.e each category of dependent variable isn't possible to occur simultaneously). Discrete time survival model can handle time-dependent covariates and deal with competing risks. Competing risk is a condition when the observed object risks to experiencing more than one event which mutually exclusive. In this paper, the object of competing risk is credit which can risk to experience default, attrition, or not both.

In this paper, we introduce a nonparametric approach by using spline regression to model hazard function. The flexibility of spline function allows us to model the nonlinear and irregular shape of the hazard functions. Then, by incorporating spline regression into the multinomial logistic regression model, resulting a combination model. This combination model has several advantages. First, by using the flexible spline regression function, it can model nonlinear, irregular and spiky shapes of the hazard functions. Second, it is easy to understand and implement, and its simple parametric form can make it easy in model interpretation. Third, this model has the ability to do prediction. Furthermore, by using a credit card dataset, we will demonstrate how to build this model, and we also provide statistical explanatory and prediction accuracy.

Therefore, by acquired the combination model, itâ€™s expected that combination model could provide excellent results in knowing the factors which effected default event and attrition event, and also provide accurate prediction in credit risk modelling.

Keywords

credit risk modeling; multinomial logistic regression; spline regression; survival analysis