A Cellular Automata Modeling for Visualizing and Predicting Spreading Patterns of Dengue Fever

,


Introduction
Modeling is a simplification of a real problem, aiming to study and understand the phenomena in the real world.In epidemiology, system modeling approach is commonly used for viewing the epidemic process [1].Most of the models for epidemics simulations are based on Ordinary Differential Equations (ODE) or statistical model [2][3][4].Moreover, visualization is required as the first step in epidemiological analysis to understand the spatial characteristics of a dataset [2].The visualization is needed for identifying the epidemiology of disease pattern in a given geographical area, predicting the spreading pattern of disease in the next period, and creating awareness to the target stakeholders based on the prediction results, hence help clinical management of disease [2].Unfortunately, ODE or statistical models are unable to elaborate spatial patterns and interactions such as in visualizing and predicting spreading disease [3].
To overcome these limitations, researchers used Cellular Automata (CA) models for involving time and space in epidemic process analysis [5].Some studies have been conducted such as developing a mathematical model of disease spread and its simulation using Cellular Automata (CA) [6], analyzing some scenarios of disease spread [7], applying the CA approach to the Susceptible-Infective-Recovered (SIR) model of disease spread by considering birth and death factors and the changes of rules for each state in the dynamic CA [8], and analyzing the complex spatiotemporal patterns observed in transmission of vector infectious disease [9].
Basically, CA is one of the dynamic system approaches that implements discretization of time and space [3,5,10].CA consists of cells, called cellular space, a local connection of to other cells, and boundary conditions [3].Each cell, representing a state, can change at every time-step using local transmission rules which would generate a new state based on the previous state of the cell and its neighborhood.Therefore, the concept of neighborhoods is very important.The effects of neighborhood structures on diseases spreading by using the ISSN: 1693-6930  A Cellular Automata Modeling for Visualizing and Predicting Spreading… (Puspa Eosina)

229
susceptible-infected (SI) epidemics CA-model was shown in [5].Moreover, the rule of neighborhood in determining the model interactions was described in [11].
The other important aspect that determines the accuracy of CA model is the transmition rule f.This rule was able to be represented as a deterministic or probabilistic function [9][10].Many methods to find function f as rule of the CA model have been introduced such as using Markov Chain [12], the differential equations of the classical model [13], and the Genetic Algorithm [14].This research used the Hidden Markov Model (HMM) to find a probabilistic function that represented the CA transmission rule, which has not been used by researchers yet.HMM is a probabilistic model that is suitable for solving the problem related to the data sequential-temporal [15].To show the effectiveness of the proposed model, this approach was implemented on the Dengue Fever case.
The reason for using the Dengue Fever case is because it counts as one of the deadly and infectious pandemic diseases in Indonesia.This disease, also called Dengue Hemorrhagic Fever (DHF), is caused by the Dengue virus and is transmitted by the Aedes aegypti mosquito as a vector.Several studies related to the monitoring of DHF in Indonesia have been conducted, such as the studies that aimed to see the trend of dengue outbreak in the future [16][17].The Time Series method for showing the trend of dengue outbreak was used in [16].The study predicted the number of dengue fever patients for the next four years based on DHF patient data in the province of North Sumatra from 2005 to 2009.The Autoregressive Integrated Moving Average (ARIMA) was compared to the Winter approach to predict the number of DHF cases in the next six months [17].This research used DHF cases data from Surabaya from January 2005 -June 2010 and applied four models of the Winter method and three models of the ARIMA method.
This paper explained how to develop a spreading pattern model of DHF on CA model that was used for visualizing and predicting spreading pattern of DHF.This study especially focused on determining a probabilistic function using HMM.The dataset from a limited area such as West Bogor in the period of 2013 was used and defined the state criteria.Moreover, this study only considered an invective state which dedicated particular attention to the spatial distribution of infected areas.The evaluation was conducted by comparing the results of the proposed model to that of the one yielded by the SIR method, as a classical approach.

Research Method
To achieve the research objective, several stages were done, including: collecting datasets, defining the model CA, constructing the data, predicting the spread of disease using an obtained model, and evaluating the model.

Dataset
In collecting the datasets, some steps were done as follows: identification of geographical study area, conducting field study for data collection, deciding sample used in this research, determining the source of the data.The dataset were collected from Dinas Kesehatan Kota Bogor (DKK-Bogor) an interview technique.The interview was conducted with the DKK-Bogor Data Officer on July 16, 2014.Table 1 showed the dataset that contains the Dengue Fever cases that occurred in West Bogor in 2013.

Defining of CA Model
A Cellular Automata (CA) is a discrete model consisting of points or identical cells that each in one certain state at the time.A state value that is allowed to a cell is the value of set of states which have been defined.The State of a cell changes according to a local transition rule at the next time in time-step [3,18].Those cells are arranged uniformly in cellular space that can be one-dimensional, two-dimensional or three-dimensional.The state condition of one cell at the next time, t+1, depends on the states of the other cells surrounding, called its neighborhood, at the time, t.Mathematically the CA model is defined as a 4-uplet (C, S, V, f).C represents a cellular space.S represents a set of possible state values for each cell in the cellular space.V is a set of neighborhoods around a focus cell.Function f defines a local transition function that represents an update rule for each state change of each cell [6].There are four steps for defining the CA model, such as: defining a cellular space, defining neighborhood used in a cellular space, defining the criteria of the possible state values, and   1 2 3 4 5 6 7 8 9 10 11 12  1.

Source: Dinas Kesehatan Kota Bogor
This research defined 16 cells in two-dimensional cellular space (Figure 1), which represent 16 regions in West Bogor (Table 1).Each cell represents a region according to id of cell listed in Table 1.For instance, the first cell in Figure 1 represents a region of Situ Gede (a region with id 1 in Table 1).Each cell defines ununiformed objects and describes the number of dengue cases that occurred in the region.The number of cell in the cellular spaces actually does not always have to be the same as the number of the observed regions.For instance, The 20 or 25 cellular spaces for the 16 observed regions could be defined by adding the definition of boundary regions (the regions which are not included into the 16 observed regions) [3].In addition, in this study assumed null boundary conditions for the proposed model.The 4neighborhoods from Von Neumann was used, a collection of five cells in which the middle cell is a focus of attention as shown in Figure 2 [6].The remaining cells are cells that affect the state change of a cell in subsequent periods.The research that related to the state changes of a cell in a two-dimensional has performed in [19].
In this research, the concept of state change was used for selecting attributes.The state changes was calculated based on the change of shape of the geometry which represented the affecting result of two dimensional rules which is applied to the pair of the attributes [19].The proposed model defined state changes based on data content on location.First, the categories values were defined in four categories and setting color for each category.Next, the state changes were seen as a cell color change in the cellular space.In HMM, the state changes are described as the state transition diagram.The four colors and its criteria of states are shown in Table 2.The next step was how to determine a Function f that represented the CA Rule based on the defined parameters.

TELKOMNIKA
ISSN: 1693-6930  In detail, the proposed method was described as follows.Firstly, from the dataset that consist of 16 regions, this study defined the two-dimensional space and put each region into one cell and set an index for each cell, then it defined an array variable to represent the 16 cells in which each cell has an index.Next, the number of data cases for each region (number of infected) were put into an array variable in which the data in one period were stored into an array variable, running as time-step.Finally, with a set of states criteria (Table 2), the data cases with the data states were replaced.The main problem of this research is how to find the function that represents a proper CA's rule.Many methods to find function f as rule of CA model have been conducted, and this research used HMM, a method that has not been used by researchers yet.By ignoring both the death and the birth factors, and by assuming that the probability of an infected cell was affected by surrounding cells, the HMM approach was suitable to be used to determine a probabilistic function f.
The CA characteristic was represented as a Markov process [20].Since the dataset was able to be classified as a time series dataset, it was proper to use a probabilistic function that can be found using HMM.HMM is a probabilistic model that is suitable for solving the problem related to the data sequential-temporal [12].Mathematically, the HMM is written as   Based on the states criteria (Figure 3), Ergodic Hidden Markov Models (Ergodic-HMM) was applied to get a probabilistic function [21,22].Each arrow in the state diagram represents a probability value of an object to change the value of a state from one period to the next one.The  3. The Emission Probabilities were stored in E as shown in Table 4.This study assumed that the initial state of a cell at the beginning of the period had the same probability of possible states values.Thus, the prior probabilities were ignored.
The probability of an object (C i ) to change its state was calculated using Bayes theorem as follows: In general, it was written as: The Transition Probability value was calculated by the formula: Moreover, the Emission Probabilities value was calculated by the formula:

Data Construction for Spreading Pattern
Data attributes used in this research are: "name of a region" and "number of Dengue cases".The cellular space is defined as a two-dimensional space in which each cell represents a region with some Dengue cases in each period.The total region in West Bogor is 16 regions.Thus, 16 cells were defined.Each cell contained some un-uniformed objects that described some Dengue cases that occurred in a region for the certain period.Each cell was defined as a one-dimensional variable . Variable X i represented a region as shown in Figure .1.Cell neighborhoods were defined as a one-dimensional array variable The Neighborhood frame moved in the cellular space with the equation: The Neighborhood frame as indicated in Figure 2 moved to each cell in the cellular space.Whenever moving, the initial condition states of each cell were checked.Next, the maximum probability value of the focus cell to changes of the state value for the next period was calculated.States value was represented by an array, with the array variable of   , , ,

S S S S S 
. To build a simulation model, Excel spreadsheets was used as a tool to find a probabilistic function, and used Scipy module in Python 3.4 as tool for evaluating the proposed model.

Predict the Disease Spread Patterns using a Proposed Model
In order to predict the spreading pattern of Dengue Fever, the CA model was applied to a new dataset.To initialize the simulation, the John von Neumann-Random Number Generator based on the CA rule was used, that is equivalent to a two-dimensional space for generating the j th cell in the i th -row by taking cells from the previous row [23] as follows: Where mod 4 indicates the number of states that ( ) i j x might describe the j th cell in the i th -row which can take on the value 0,1,2, and 3.The values of the cells in the first row, which range from 0 to 3, were randomly assigned by Simple Linear Congruential Generators.

Verification and Validation
Evaluation of the proposed models was conducted by doing verification and validation.Model verification aims to ensure that the CA model has been implemented correctly.Moreover, the purpose of validation is to determine whether the theory and assumptions underlying the preparation of this model are correct [24].The SIR-model (Susceptible-Infected-Recovered) is a simple mathematical model based on ODE that has been proven to be an acceptable model in the epidemic fields [25].The SIR model was represented as shown: Where S = number of susceptible, I = number of infectious, and R = number of recovered.Case β represents the transmission probability of the disease and γ represents the period of infection.
Verification of the model was conducted by comparing the tendency of graphs yielded by the proposed model, and the trend graphs yielded by the SIR model.Sequentially, validation of the model was performing by measuring the proximity of the simulation results of the proposed model and those of the SIR-model using a correlation coefficient measure to compute similarity [26]: Where i X and X are declared time-series data and the average generated by the proposed model, respectively, i Y and Y are declared time-series data and the average generated by the SIR model, respectively.X  and Y  represented standard deviation of variable X and variable Y.The similarity value lies between 0.5 -1.The closer it gets to 1, the two time-series data can be said to be similar [27].

Results and Analysis 3.1. Probabilistic Functions Obtained as Rule on CA Model
The spreading pattern and prediction model were represented by the probabilistic function that represents the CA rule.The probabilistic function obtained in this research is described as follows: ) V represents state value of cell V 0 in n-th period, and is the probability of V 0 that is at S i in the next period.This function allows us to choose the maximum value of the probability of the state change.It means that the change of a state in the cell of V 0 in the next period depends on the maximum probability value obtained from equation (12)(13)(14)(15).These probabilities consist of four possibility values that are defined as follows.
The values of were obtained from the Emission Probabilities Matrix using the equation (15)(16)(17)(18).Moreover, the values of were obtained from T, was obtained based on the formula described in Table 3 with the results of probability values as follows (Equation 16): This matrix shows that the change from state S 4 to S 1 has the highest probability value.The matrix also shows that the possibility of a cell's state change next period from S 3 to S 4 was very small or never occurred.Moreover, if the condition of a cell was in the state of S 4 , the state tends to change to the better condition because the probability of the cell's state to keep its state was very small or never occurred.It means that there were always the preventive actions to stop the spreading of Dengue Fever diseases in this area.In this research, E is a matrix for representing the state change probabilities of a certain area affected by its neighborhood.There were four matrices E as sequently, from Equation 17up to Equation 20 as follows.
Equation ( 17) described the probability of a state change of cell V 1 on the next period that is affected by the state change of cell V 0 .It was shown that the probability of V 1 's state changing to S 4 was very small or never occurred, while V 0 was in S 3 .Moreover, this matrix also showed that the change from the state of area V 1 to S 1 , while V 0 was in S 4 never occurred.
Equation (18) shows that the state change of V 2 to S 4 , affected by V 0 , would never occur while the state of V 0 is in the condition of S 1 or S 4 .
Equation ( 19) describes the probability of a state change of cell V 3 on the next period that is affected by the state condition of V 0 .Equation ( 20) describes the probability of a state change of cell V 4 on the next period that is affected by the V 0 state condition.From the four matrices above, it is concluded that the extreme change conditions of the neighborhoods to S 4 , affected by the focus area, are very rare.

Prediction Model Results Obtained Using CA Model
An example of the simulation results is shown in Figure 4.The pattern obtained using Equation (11).The inputs to this equation were the Odds Transition Matrix (Equation 16), the Odds Emissions Matrix (Equation 17-20), and the randomized data initialization that was obtained by Equation (6).The visualization of the obtained pattern results indicated that the spread of Dengue disease occurred on average in seven to eight periods.From several computational simulations, it was seen that if the disease began to spread simultaneously in cells 11 and 16, the pandemic would subside in a longer period.For example, cells 11 and 16, respectively, represented the area of Cilendek Barat and Pasir Kuda.It also appears that cell 15, representing the area Pasir Mulya, is the most vulnerable cell to the spread of the disease.It was shown with the color indicator in that area.

Evaluation
Verification was conducted by comparing the prediction results yielded by the proposed model (using the CA approach) to those of the ones yielded by the SIR model (a classic and popular approach).The verification results showed that the slope of the infected area-period graph (Figure 5) that represented the CA model (the proposed model) was similar to that of the SIR model.In addition, validation was conducted by calculating the similarity between the resulting prediction using the CA model and that of the one obtained by the SIR model using Equation 10.The validation result indicated that the resulting prediction using CA had similarities to the SIR model of 0.95.Thus, based on the verification and validation results, it was able to be stated that the proposed approach using CA had been implemented correctly, and that the assumptions underlying this model are correct.Thus, the visualization of the spreading pattern yielded by this model was able to be used for understanding and predicting the spread of Dengue disease.This prediction is required for helping prevent the spread of Dengue disease in prone regions.However, the proposed model still has a limitation in that it did not consider the behavior of people.Thus, this model is only valid to a relatively static society.

Conclusion
From the results, it was concluded that a spreading pattern model of the Dengue Fever based on the CA approach was successfully developed by setting four parameters supported by HMM and using Von Neumann neighborhood.The proposed model was able to predict the spread of Dengue disease and provided us the information of which area should be observed carefully.Moreover, the evaluation result showed that the CA model was capable of generating patterns similar to that of the one generated by SIR models with a similarity value of 0.95.


ISSN: 1693-6930 TELKOMNIKA Vol.14, No. 1, March 2016 : 228 -237 230 finding some probabilistic functions f that represent the CA rule.Function f is required to obtain a spreading pattern of Dengue Fever.

Figure 1 .Figure 2 .
Figure 1.An Example of a Cellular Space Construction λ is the HMM model, T is a matrix of Transition Probabilities, E is a Matrix of Emission Probabilities, and π is a Prior Matrix [15].In the CA model that has been defined, the change of a cell state to another state can be described by a State Transition Diagram.The State Transition Diagram express HMM model as T. The state change probabilities of a certain area affected by its neighborhoods are called as emission probabilities that express HMM model as E.

Figure 4 .
Figure 4.The Prediction Results of Dengue Spreading Pattern on The CA Model

Figure 5 .
Figure 5.The Tendency of Graph Number of Infected Area in Bogor Barat

Table 1 .
Number of Dengue Cases in West Bogor in 2013

Table 2 .
State Definition of Infected Area