An intelligent strabismus detection method based on convolution neural network

ABSTRACT


INTRODUCTION
Strabismus is an eye disease in which the iris of the eye cannot align in the same position [1], which occurs often in childhood. Mainly, It is caused by a problem occurring in the optic-nerve, brain, or extraocular muscle [2]. Dangerous factors include familial inheritance [3] and early births underlying strabismus disease which has a severe impact on human life. In addition, it can prevent the brain from fusing the images collected by the two eyes, which leads to amblyopia [4]. Untreated seeing eyes can degenerate, leading to blindness [5]. In addition, double vision and deep insight of strabismus patients is lower than that of healthy people. Therefore, the prognosis and treatment of strabismus becomes increasingly important where the detection of strabismus is the first and one of the essential steps. The traditional approaches to detecting strabismus are usually done in hospitals. Doctors of patients with strabismus use the hirschberg test [6] to determine if the patient has strabismus: a thin beam of light is sent into the patient's eye to check if each ocular reflex is in the same place on both corneas. As the number of patients with strabismus increases, the detection of strabismus mainly becomes annoying and prostrate with error. However, automatic detection is a useful and practical approach to alleviate the growing demand for detection of strabismus [7], Horner et al. [8] uses telemedicine to diagnose strabismus in places where specialists are not available. In such a situation, patient images were taken by high-resolution digital cameras and then sent to specialists for remote analysis and determination of strabismus.

1289
In addition, some methods of detecting strabismus with digital tools are used. Attada et al. [9] uses photorefraction to achieve narrow angle strabismus detection. Loudon et al. [10] use the pediatric vision scanner to detect strabismus. Almeida et al. [11] applies a digital camera and the Hirschberg test to identify strabismus. Valente et al. [12] carry out the detection of strabismus in digital videos using coverage tests. Chen et al. [13] use an eye tracking system and convolutional neural networks to detect strabismus. Most of the previous studies use eye and eye tracking devices to capture the position of the iris or high-resolution images captured by digital cameras. In addition, the classification step did not lead to classifying the type of strabismus. In this article, deep neural network models are used to perform automatic detection of strabismus and specify the type of strabismus according to different sources of image acquisition (for example, low to high resolution of digital images). In recent years, deep neural networks have increased, as an efficient deep neural network training algorithm [14]- [20]. The rest of this paper as: section 2 will highlight the theoritical part of the proposed method. Section 3 gives the main results obtained from considering three datasets. Section 4 provides a conclusion of this study.

METHOD 2.1. Dataset
Three datasets are used in this work in a total of 285 human facial images. The first dataset as shown in Figure 1(a) is captured by utilizing a low-resolution camera (i.e. laptop or mobiles camera) with a metal stand for aligning the face angle within the camera. The second dataset, as Figure 1(b), obtained from ophthalmologists, where all images collected from patients using mobile phones, and the third dataset obtained from previous studies called impa-faced dataset [21] as illustrated in Figure 1(c). All datasets images are scaled to a fix coordination (e.g. 640×480 pixels). To acheive the automated strabismus detection, datasets are carefully annotated by ophthalmologists. For model learning purposes, we divided the datasets into two subsets: a training set contains 205 patients images (represents 72%) and a testing set contains 80 patients images (represents 28%).

Proposed model
In this work, the proposed model uses the convolutional neural network (CNN) to acquire deeply the features vector for automatic strabismus detection. The model is consisting of two stages: firstly, the eye region segmentation from the face is performed using the viola-jones algorithm [22]. Secondly, map the segmented eye regions into two output classes (strabismus: 1 or normal: 0) according to each eye iris position. The general flow diagram of the detection method illustrated in Figure 2.

Segmentation of eye region
Applying three datasets from different sources will enhance the model training stage. Strabismus detection depends primarily on the discovery of eye regions from the human face. For this purpose, we applying the viola-jones algorithm to detect the eye region from the patient face [23]. Further, we select a pre-trained classifier model from a pool of classifiers to identifying eye pair location while the face is aligned in front of the camera using a metal stand. The algorithm detects the region of eye pair in rectangular form based on the coordination points ( , , , ℎ), where ( , ) represent the point of rectangular eye region top-left position, and (ℎ, ) stands for the rectangular height and width respectively. Note, the values of the rectangular eye region shape are in pixels. Deeply speaking, a detector detects the eye region object using a sliding window over the input image. The size of the window could be adjusted for detecting objects at various scales; however, the aspect ratio remains the same. The stages of the classifier are designed to remove the negative pieces from an image. At this point, we get a bounding box from applying haar features (shortly HF) to detect the eyes pair region. Consequently, it results in extracting the eye pair with the class number. Figure 3 illustrates instances of segmented eye pair regions.

Separation of eye region
After obtaining the eye region as Figure 3, the next step is to separate left and right eye segments. For strabismus detection, both separated models would be of the same size. Practically speaking, after separation we resized each image to 42×22 pixels and fed it to CNNs as a training set. The target class number will map the position of the eye iris (e.g. 0: center, 1: left, and 2: right) as Figure 4(a) and Figure 4

Establishing convolution neural network
After separating the eye regions, a deep learning CNN is formed to classify the eye regions. CNN is made up of several neurons arranged or arranged in rectangular layers [24]. The spatial arrangement of neurons is the fundamental property of CNNs, which allows CNNs to be used in a variety of applications. Moreover, sparse connectivity, sharing of settings, and pooling are the other essential properties of CNN.

Sparse connectivity
Sparse connectivity shows that every neuron withinside the CNN is attached to a small location of neurons withinside the preceding layers or the following layer. It is performed with the aid of using the use of a kernel decrease than the enter. For example, whilst acting photograph classification, the enter photograph may also have a number of pixels. However, only some beneficial features, inclusive of aspect and shape, are detected with the aid of using kernels. Sparse connectivity can lessen the quantity of saved parameters, that's of incredible significance to the performance of the network.

Parameter sharing
To improve the proposed model detection and execution time, all the neurons that sharing the same layer will share the same layer's parameters. In other words, calculating one set of parameters is sufficient instead of computing separate parameters set for a new location. Hence, the model will have the ability to highlight patterns that are tilted, slightly warped, or shifted within input images.

Pooling
Pooling Indicates that rather than doing convolution, the gatherings of the neuron's output are performed. Max-pooling is the recommended usage of the pooling function that aggregates the neurons and returns the highest value from a rectangular region. Generally speaking, the structure of CNN is usually composed of numerous convolutional and pooling layers, ended with one or many linked layers. Convolutional layers are applied to discover vital features, while the pooling layers are used to keep task-associated data and ignore inappropriate items [25]. Fully linked layers are mapping the excitations to the output neurons, each excitation is mapped to one target class.

Network architecture and training parameters
In this paper, two models of CNNs have been developed. The first model preserved for classifying the left eye images, and the second model is used for the right eye images. The architecture of both CNN models is composed of two convolutional layers and followed by two pooling layers. Each convolutional layer is followed by one pooling layer that using the rectified linear activation unit (ReLu) activation function [26]. ReLu is an activation function beneficial to optimize the quality of the network. Figure 5 shows three fully connected CNN layers. To avoid over-fitting, the dropout strategy [27] applied in network layers. It is worth mentioning that each convolutional layer has a batch normalization layer. It acts as a regularizer to accelerate network training 14 times [28]. We apply the stochastic gradient descent method [29] for training the network. In addition, regularization with weight decay is used for network training. The ratio of the dropout sets to 0:5 and the learning rate sets to 0.002.

Evaluation metrics
To evaluate the performance of the model, we intend to apply three well-known evaluation metrics. These metrics are sensitivity ( ), specificity ( ), and accuracy ( ) as shown in: In these metrics, true positive ( ) representing the numbers of correctly identified strabismus images, true negative ( ) represents the correctly identified normal images. False positive ( ) that means the incorrectly identified strabismus images. The false negative ( ) is for bad identification of normal images. and are responsable to give an algorithm the ability to classify normal and strabismus images from one hand.
reflects the classification performance [30] in the other hand.

RESULTS AND DISCUSSION
In this section, we conducted two experiments, after the training of CNN models, to evaluate the quality of network outcomes. The first experiment is to monitor the performance of CNN detection layers of iris position based on two classification classes. The second experiment simulates the detection efficiency of various classifiers when providing CNN accuracy and mean square error. It illustrates the accuracy of the matching results from three image classes using the CNN and the classes of images already labeled. Finally, we conducted a focused comparison with other studies.

Training of CNN models
For training the CNN models based on the training datasets for left and right eye regions, we apply statistical measurements: the accuracy and the mean square error (MSE). For the left eye, the accuracy of training achieves 100%, and the MSE is 0.0201. Table 1 shows these outcomes. For the right eye, accuracy achieves 100%, and mean square error equals 0:0328 as illustrated in Table 2.  Figure 6 shows the state of accuracy with respect to the number of iterations for left and right eyes in the training stage. In the first epoch, the accuracy was 20.31%. In epoch 50, the accuracy raise to 96.88%. In epoch 100, the accuracy reached 99.22%, and in the final epoch, the accuracy achieves 100%. For the right eye, the final accuracy achieves 100%, and the mean square error equals 0:0328. Figure 7 shows the state of mean squared error in training state with respect to the number of iterations. For left eye, epoch one started by a mean square error of 1.5670, in epoch 50, the mean squared error decreased to 0.0657, in epoch 100, the mean square error was 0.0316, and in the final epoch, the mean squared error value recorded 0.0201. For right eye, shows the state of mean square error in training state, wherein epoch one, the mean square error was 1:2926, in epoch 50, the mean squared error reaches to 0:1237, in epoch 100 the mean square error is 0:0596, and the last epoch, the mean square error achieves 0:0328. We observed that as the training samples increases, the accuracy of both CNNs increases. In less than 100 training patterns, the detection results improve significantly with the rise of training patterns. Applying over 100 training samples for training the model, we noticed that the detection results are varying insignificantly with the increasing amount of training patterns. From the above observations, 205 training patterns are selected for training CNN. The CNN architecture is illustrated in Table 3.

Model testing: first experiment
After training the CNN, to capture the iris of the eyes, we observed that the training models achieved the high scores of sensitivities = 0.97656 and specificity = 0.875. It indicates that the proposed models achieve good detection performance in the testing stage for classifying the normal and strabismus images. In this experiment, we will apply the test dataset on the trained CNN model to capture the position of the iris. This process achieved 0:95625 accuracy, which means that the CNN model can predict the classification class as Table 4. In this table, left eye and right eye indicate the predicted class number from each CNN eye model depending on iris position. We mean, each eye region (left or right eye image) is divided into three equal partitions (left, center, and right), each partition is mapped to a specific class number (1: left, 0: normal, and 2: right). In addition, target class indicates the strabismus class number (0: no strabismus, and 1: strabismus) based on the left and right eye iris class number. Moreover, the statistical measurements on the testing dataset results are Table 5.

Model testing: second experiment
The training models of CNN are used to classify deeply both eyes' images into three classes (normal, exotropia, and esotropia) as Table 6. In this figure, the decision classes represent three types of horizontal strabismus problems. As mentioned earlier, the images from three datasets are labeled by ophthalmologists. These images are classified into three classes (1, 2, and 3) to determine the direction of the person's eye looks. Class 1 indicates to the eye in front of the camera. Class 2 is related to the eye on the right side of the camera. Lastly, class 3 refers to the eye on the left side of the camera. The classification accuracy was 95:62% showing the comparison of the output and the label of the eye image. Table 6 shows the strabismus status of the person's eyes from the labeled image.  Target class  Decision  1  1  0  No strabismus  1  2  1  Strabismus  1  3  1  Strabismus  2  1  1  Strabismus  2  2  0  No strabismus  2  3  1  Strabismus  3  1  1  Strabismus  3  2  1  Strabismus  3  3  0 No strabismus

Comparisons with other studies
The performance of the proposed method was experimentally compared with other studies in the related field. The comparison results are as Table 7. In this table, the column of the study represents various studies with different datasets. Some of these studies have 94% for 45 pictures of the patient. Our system exhibited an accuracy of 95:62% for three datasets of 285 images.

CONCLUSIONS
Strabismus has become an influential ophthalmologic disease in human life. It plays an important role in the prognosis and treatment of strabismus. Automated detection is an effective method to achieve the suitable detection of strabismus. Concretely, automated detection is applied to attain rapid strabismus detection, which means collecting the medical data and then sending the data to specialists for physical diagnosis and examination. Three data sets on strabismus are considered in this article. In addition, all the images collected are tagged by specialists in ophthalmology. In addition, a deep learning technique using CNN model was applied for automatic detection of strabismus. The study method first uses the viola-jones algorithm to segment the eye region, and then classifies the segmented regions as strabismus using CNN. The CNN result classes are considered as inputs to two experiments. The latters were proposed in the acquisition of the position of the iris. A first experiment uses an artificial neural network to train sequenced ocular regions in order to predict the existance of strabismus or not class as a function of matching the iris position in both eyes in one hand. In the other hand, the second experiment is for predict the type of strabismus (i.e. normal, exotropia, and esotropia). The obtained results from the proposed method are promising and the results from the experiments show highly applicable rates on the automatic detection of strabismus for medical applications. For future work, we need to investigate other types of strabismus such as the V shape and the vertical strabismus, and to investigate more clinical cases.