Image-based Glaucoma Classification Using Fundus Images and Deep Learning

Glaucoma is an eye disease that gradually affects the optic nerve. Intravascular high pressure can be controlled to prevent total vision loss, but early glaucoma detection is crucial. The optic disc has been a notable landmark for finding abnormalities in the retina. The rapid development of computer vision techniques has made it possible to analyze eye conditions from images enabling to help a specialist to make a diagnosis using a technique that is non-invasive in its initial stage through fundus images. We propose a methodology glaucoma detection using deep learning. A convolutional neural network (CNN) is trained to extract multiple features, to classify fundus images. The accuracy, sensitivity, and the area under the curve obtained using the ORIGA database are 93.22%, 94.14%, and 93.98%. The use of the algorithm for the automatic region of interest detection in conjunction with our CNN structure considera-bly increases the glaucoma detecting accuracy in the ORIGA database.


INTRODUCTION
The World Health Organization (WHO) states that glaucoma is the second leading cause of blindness in the world, only after cataracts [1] . Since there is no cure for glaucoma [2] , it is necessary to diagnose this disease early to delay its development [3] .
Glaucoma is characterized by optic nerve damage due to increased degeneration of nerve fibers [4] . Since symptoms appear until the disease is severe, glaucoma is called a silent thief of sight [5] . Typically, aqueous humor drains out of the eye through the trabecular meshwork, but when the passage is obstructed, aqueous humor accumulates. The increase in this fluid will cause pressure to grow and cause ganglion cell damage [6] . However, it was found that the pressure measurement was not specific or sensitive enough to be the only useful indicator for detecting glaucoma because visual impairment would occur without increasing pressure. Therefore, a comprehensive examination should also include the use of images and visual field tests to analyze the retina [7] .
Fundus imaging is the process of obtaining a two-dimensional projection of the retinal tissue employing reflected light. Finally, the resulting 2D image contains the image intensity that indicates the reflected light amount. Color fundus imaging is used to detect disease, where the image has R, G, and B bands, and their intensity changes highlight different parts of the retina [8] .
Due to their non-invasiveness, fundoscopy and optical coherence tomography (OCT) has become the imaging methods of choice for the detection and evaluation of glaucoma [9] . Nevertheless, due to its cost, OCT is not readily available.
There are different types of glaucoma, such as open-angle, closed-angle, secondary, normal-tension, pigmentary, and congenital glaucoma [10] . Some types require surgical treatment [11] . The optic disc and cup, peripapillary atrophy, and retinal nerve fiber layer [12] are four structures that are considered essential for detecting glaucoma. Family history has also been shown to be genetically related to the onset of the disease [13] .
In most cases, treatment in the early stage of the disease can prevent total vision loss in glaucoma patients [14] , so a system that can help ophthalmologists make a diagnosis could increase the chances of saving people's vision. However, designing a system that provides reliable tests for glaucoma diagnosis is a complicated and stimulating task in clinical practice [15] . Figure 1 shows an image of the fundus in a healthy condition where the most important parts can be seen, such as the optic nerve, which is where the study of this work will be focused. The outer part of the optic nerve is called the optic disk (OD), and the smaller blurry inner circle is called the optic cup (OC). In the OD area, we can see the main arteries and veins, whereas the veins have a darker color than the arteries. Veins usually are larger in caliber than arteries, having an average AVR ratio of 2:3 [17] . FIGURE 1. Image of the optic nerve [16] .
The region of interest (ROI) usually is less than 11% of the retina's fundus image's total size. By decreasing the size of the image with the ROI's detection, a reduction in the computational resources used is possible [18] . Both optic disc and cup segmentation are essential components of optic nerve segmentation and, together, form the basis of a glaucoma evaluation. In [19] , a reference data set for evaluating the cup segmentation method was published. Nevertheless, it does not provide free access to the general public.
There are relatively few public data sets for glaucoma evaluation compared to available data sets for diabetic retinopathy [20] and vascular segmentation [21] . The ORIGA database contains 650 retinal images labeled by retina specialists from the Singapore Eye Research Institute. An extensive collection of image signs usually taken into account for the diagnosis of glaucoma are annotated. In the Drishti-GS dataset [22] , all images Usually, the fundus image analysis with artificial intelligence is carried out from two approaches, 1) the classification at the image level and 2) the classification at the pixel level. In image-level classification, the learning model is trained with images previously classified by an expert, in this case, a retina specialist.
This classification generally contains the disease progression [23] . For example, the severity of DR is classified into five grades, and each of these grades is associated with a number. The learning model begins to associate the patterns in the image to their labels. The second approach is the anatomical and lesion segmentation, such as separating the blood vessels from the rest of the retina in order to measure their caliber. For example, in the case of Hypertensive Retinopathy, the vein/artery relationship plays an essential role in the diagnosis of the disease [24] .
There are several works carried out for the detection of glaucoma. Both images based and pixel-based classification have been used. We will follow an imagebased classification approach in this research.
A six-layer CNN architecture was proposed by Chen et al. [5] , where four layers are convolutional, and the final Acharya et al. [6] proposed to use a Support Vector Machine for classification and the Gabor transform that will notice the subtle changes in the image's background.
The database used was a private database of Kasturba Medical College, Manipal, India, with 510 images. 90% of the images were used for training, while the remaining 10% for testing. The results obtained were 93.10% accuracy, 89.75% sensitivity, and a specificity of 96.20%.
Raghavendra et al. [1] proposes to perform the automatic recognition of glaucoma utilizing a convolutional neural network of 18 layers. This work consists of a standard CNN, with convolution layers and max-pooling, and a fully connected layer where classification is performed.
Initially, 70% of the randomly selected samples are used for training and 30% for testing. 589 healthy images and 837 with glaucoma of a private database were deployed.
The process was repeated fifty times with random training and test partitions. The results obtained were 98.13% accuracy, 98% sensitivity, and 98.3% specificity.
Gour et al. [8] proposes an automatic glaucoma detection system using Support Vector Machine (SVM) for classification. Combine GIST and PHOG to extract features in images. This technique eliminates the need for image segmentation. Instead, it works with a diagnostic system that makes use of characteristics such as texture and shape to detect the disease. This technique yielded an accuracy of 83.4% using the Drishti-GS1 and High-Resolution Fundus (HRF) databases.
Gheisari et al. [25] implement two architectures, the VGG16 and ResNet, concatenating LSTM blocks. To determine the best one, they carry out several experiments varying the number of epochs and learning rate.
The best results are achieved with the VGG16 network, achieving 95% sensitivity and 96% specificity.
Gómez-Valverde et al. [26] use architectures such as Pinto et al. [27] use 5 databases adding a total of 1707 images. They carry out the experimentation with each of the databases separately, but the best results were obtained by putting together all the available images.
They achieve an AUC of 96.05, a specificity of 85.8 and a sensitivity of 93.46 using Xception architecture.
We propose an image-based classification approach using a deep neural network for glaucoma diagnosis in fundus images in the present work. A preprocessing step will be carried out where the region of interest will be extracted, specifically the region where the optic disc is located. Once that region is obtained, a neural network is fed with the cropped images in order to classify if the image has or not glaucoma.

Preprocessing
This section introduces a method to locate the retina optic disc, which contains the necessary features to diagnose glaucoma. These image areas will be the in-put to the neural network proposed in this work. Since the images' size is 3072 x 2048 pixels, it is reduced to four times its original size, leaving the images' dimensions in 768 x 512. The input images are converted to grayscale. Therefore, it is reasonable to obtain a higher contrast of the optic disc than the original image. For this, the red and green channels are used, which are the ones that have the most significant impact on the optic disc. To do this, Equation 1 is applied.
!"#$ = * 0.9 + * 0.5 Where Img gray is the grayscale image, and R and G are the corresponding red and green channels. To determine which channels would be used, the histograms per channel of the image were obtained, in which it is determined which of them have the greatest impact on the contrast of the image. Conventional grayscale conversion is obtained using the luminance coefficients in ITU-R BT. 601-7, which is a recommendation that specifies digital video signal encoding methods [28] . The next step is to scroll a kernel through the image to divide it into different sub-images to determine where the optic disc is located as shown in Figure 4.
The optic disc is the most brilliant part of the retina.
Excess brightness can be eliminated in parts where the optic disc is not located.
Where promPix is the average of the pixel values, p i is the current pixel value, and n is the total number of pixels. With this new sub-image selected, the pixels' average value is taken, but now of each column and row. This average will give us x and y coordinates. These coordinates represent the new image center that will be cropped to obtain the ROI. An example of this can be found in Figure 5.
The result after cropping an image can be found in Figure 6. The entire process explained above for the ROI detection is shown in the pseudo-code in Algorithm 1.
The preprocessing described in Algorithm 1 is applied to the 650 images of the ORIGA database [16] to obtain the ROI. With the region of interest located, we pro-ceed to perform normalization to the images. What is sought is that the values are within a smaller range, since the CNN's do not perform well when the input numerical attributes have a very large range. The normalization used (Equation 4) was the min-max [23] .
!"#$ = * 0.9 + * 0.5 The order of the layers; convolution, ReLu and max-pooling is used in several well-known architectures, such as VGG16 [29] . The idea of using two fully connected layers came from the AlexNet [30] network, which uses 3 fully connected layers. Based on these two articles, an architecture is proposed.
As activation function at the output of the network, the sigmoidal function was used. This function was chosen because we are interested in a binary classification, so it will give us a probability of which class it can belong to. This activation function should not be confused with the one used between each convolutional layer. ReLU is used to add non-linearity to the network and sigmoidal is used to regularize the output. The first fully connected layer has 128 neurons and the second has 2 neurons for classification.
As a loss function, we use Binary Cross Entropy. This was chosen due to the fact that binary classification is sought. This function will give us the prediction error, which will indicate to the network how correct its classification, which will help the network to improve as they pass the epochs. where the optic disc was not correctly detected are shown in Figure 8.

RESULTS AND DISCUSSION
For evaluation of the obtained results, some metrics such as accuracy, sensitivity, and area under the curve were used: The sensitivity tells us how many of the positive cases the model was able to predict correctly.
The ROC-AUC is the metric that shows us a probability curve, which is shown by sketching the range of true positives against the range of false positives at various threshold values. The results obtained with different learning rates without ROI are shown in Table 1.

Tabla 2
The results obtained with different learning rates and ROI are shown in Table 2.  Table 3 Method Performance Six layers CNN [5] ROC-AUC: 88.7 Different features from Gabor transform (SVM) [6] Accuracy: 93.    In order to check on which part of the image the network was more focused to perform the classification, it was decided to obtain a method of visualization of neuron activations. There are several options to perform this task. Erhan et al. [31] propose a method that seeks to produce an image that shows the activation of the specific neuron we are trying to visualize. Zeiler et al. [32] propose a method to find the patterns of the input image that activate a specific neuron in a layer of a CNN.
Dosovitskiy et al. [33] propose a method that consists of training a neural network that performs the steps in the opposite direction to the original one so that the output of this new network is the reconstructed image.
The method used in this work is the one proposed by Selvaraju et al. [34] that adopts a CAM architecture, which generates a class activation map that indicates the most used image regions. Retraining is required for this purpose. In Figure 11 we can visualize the regions in the image that have the most weight while making the classification in some of the images.

Figura 11
Original Image Heatmap Grad CAM Healthy Glaucoma FIGURE 11. Class activation map display.
For the color map, the Jet configuration was used, which returns a color map in a three-column matrix, with the same number of rows as the color map of the original image.  Table 3 Method Performance Six layers CNN [5] ROC-AUC: 88.7 Different features from Gabor transform (SVM) [6] Accuracy: 93. Each row will have different intensities between the color red, green, and blue.

CONCLUSIONS
As we stated, glaucoma is one of the principal causes of blindness globally. It could be vitally important to have a tool capable of supporting ophthalmologists to diagnose this condition more quickly.
The proposed method achieves excellent metrics with a not-so-deep neural network, achieving an accuracy, sensitivity, and area under the curve of 93.22, 94.14, and 93.98, respectively. To corroborate the performance of our approach, we did our analysis on the ORIGA database, which is public, and it is one of the most used databases for glaucoma analysis.
The preprocessing of the images to obtain the ROI also helps the algorithm be more effective. It obtains the region where the optical disc is located in almost all the images of the database used, being a method to use in future work.
We will further examine different alternatives to increase the classification performance, either by preprocessing or modifying the network structure.
The purpose of this investigation is to get a high classification method that could be implemented for automatic glaucoma detection; this would save specialists time and speed up the diagnostic process.
By obtaining the characteristics map, we were able to visualize in which part of the image the decision of the network has more weight. Therefore, we conclude that the decision is made depending on the disc and optical cup characteristics, as shown in the results.
For any reader that would like to see the implementation code, it can be provided by requesting it to the corresponding author.

AKNOWLEDGMENT
We would like to thank CONACyT for partially founding the project.

ETHICAL STATEMENT
The database used in this work is public; therefore, all ethical considerations are met.