Applying convolutional neural networks for limited-memory application

ABSTRACT


INTRODUCTION
Object detection is a computer technology related to computer vision and image processing that deals with a combination of object classification and object positioning.The advent of modern advances in deep learning [1][2][3] has led to significant advances in object detection.Most recent research focused on designing a complex network for object detection based on neural network to enhance accuracy, such as single shot detector (SSD) [4] and faster R-CNN [5].
Many researchers are devoted to developing a computer technology and deep learning in the modern life for ít outstanding advantages.Convolutional neural networks (CNNs) applied on the dataset of image data (especially lung X-ray) [3] for classification of pneumonia disease and the result was obtained an accuracy rate of 97%.The AlexNet's deep convolutional neural network used as a pre-trained neural network with 1000 categories for image classification [6] to detect and geotag advertisement billboard in real-time condition, and experimental results achieved 92.7% training accuracy for advertisement billboard detection.By using convolutional neural networks, Z. Rustam, et al., [7] proposed the method to assist doctors in providing the appropriate beliefs and predictions to patients, the results showed the capability of CNNs method to accurately identify the patient's X-ray test images.According to the results published in [8], the CNNs model uses 64x64 input shape, 0.0001 learning rate, 3x3 filter size, epoch 100 count, data training 160, and data testing 40, the accuracy level of training and testing in classification of golek puppet image attained 100% accuracy.This is an ideal result that demonstrates the effectiveness of using CNNs method in object classification.An application of transfer learning by using CNNs method based on the inception-v3 architectural model [9] for early detection of terry's nail.The accuracy obtained with training data 90%, precision and memory, each of which is worth 95.24%, 100%, and 90.91%.Specially, we introduce you only look once (YOLO), a unified model for object detection.The YOLO model [10] is simple to construct and can be trained directly on full images.Unlike classifier-based approaches, fast YOLO is the fastest general-purpose object detector in the literature and YOLO pushes the state-of-the-art in real-time object detection, to do so YOLO generalizes well to new domains making it ideal, fast, robust object detection for applications that rely on.However, all of the algorithms require a large amount of resources of the system, and to put them on limited hardware devices needs to be streamlined and compiled into limited hardware.
Related to ensure the maritime safety, the main objective constitutes the following two tasks as follow: the first is ensuring the safety of life and property at sea from the geographic and operational hazards (underwater obstacles, collision, harms and damages caused by the unfavorable weather conditions) and the second is ensuring the safety of ship control throughout the journey by the sailer, if during an emergency situation, a navigational officer is not capable of handling that situation, it can lead to maritime collision.For the first task, there are many studies to improve, upgrade current systems that have shortcomings in regard to availability, integrity, monitoring and system life expectancy as the global navigation satellite system [11] and the regional satellite augmentation system for maritime applications [12], or the design of satellite constellation for Indonesian maritime surveillance using the AIS data acquisition by LAPAN-A2 and LAPAN-A3 satellites [13] with the eight satellites in an equatorial orbit for near real-time AIS monitoring in Indonesia and the other equatorial region make a better global maritime awareness and ensuring the maritime safety.The second task, to design and manufacture systems serving ships to ensure safety in ship operation process by using new computer technonogies as neural network, fuzzy-neural, or genetic algorithm.
In this paper, we aimed to apply the modified SSDLite_MobileNetV2 bounded CNN algorithm to bridge navigational watch & alarm system (BNWAS), extensive experiments showed that the proposed method can achieve the state-of-the-art results compared with the best current method based on hand crafted features [14] and three other related CNN based methods [15][16][17] and our previous work [18] for image analysis.Moreover, we have validated the rationality and robustness of the proposed model with more supplementary results.The inverted residual bottleneck layers allow a particularly memory-efficient implementation which is very important for mobile applications.A standard efficient implementation of inference that were used for instance Tensor Flow [19] or Caffe [20] built a directed acyclic compute hyper graph G.With a small hardware system, we used the SSD Lite MobileNetV2 structure because it was fast and accurate.Not only were the requirements for image processing, object detection and classification met, the system also abode by IMO [21,22], IEC [23] and [24,25] regulations which could be tested and directly operated on board.We carefully designed a new CNN based method for detecting various typical image-processing operations, the main contributions of this paper are given as follow: − We first converted the input image into residuals to suppress the influence of image contents, and then used a convolutional layer to increase the channel number.− We employed six similar layer groups to obtain the high-level features of the input image.− Finally, we applied the resulting features into the full connect layer for classification of the system, we proposed a solution to always maintain the boundary of the total memory capacity in the following robust bound and applied on the BNWAS.The rest of the paper is organized as follows; section 2 shows some related works and proposed the method reducing memory while ensuring image quality for object detection and section 3 describes the structure of the proposed BNWAS based on convolutional neural networks, presents the experimental results and discussions.Finally, the concluding remarks are given in section 4.

CNNs BASED SSD LITE-MOBILE NET METHOD FOR OBJECT DETECTION WITH LIMITED-MEMORY
CNN models are highly accurate, but they all have a common drawback that is they are not suitable for mobile applications or embedded systems with low power computing.In literature review, the authors in [26] introduce resource-frugal quantized convolutional neural networks to reduce their size without adversely affecting the classification capability for segmenting hyperspectral satellite images, especially focusing on the memory savings of quantized CNNs.Moreover, an approach using object class clustering to lower bit precision beyond quantization limits proposed by Prateeth Nayak, et al. [27] used 3 schemes, which are uniform-ASYMM, uniform-SYMM, and power-of-2.The result is all of quantization scheme achieved near original model accuracy for every tested model.
If you want to develop these models for real-time applications, you need an extremely powerful configuration (GPU/CPU) for embedded systems (raspberry Pi, nano PC) or applications running on smartphones.Therefore, we need to build a model like SSDLite-MobileNet hybrid.The main factor will help  ;    = {1,0} is an indicator for matching i-th default box to the j-th ground truth box of category P. If m default maps are used for prediction, we suggest the form the scale of the default boxes for each feature map is computed as: Based on [24], we set parameter   is 0.2 and   is 0.9 (s k is 0.1, 0.2, 0.375, 0.55, 0.725.0.9 means 30, 60, 112.5, 165, 217.5, 270 pixels input image (300x300)).
The structure contains a completely original convolution layer with 32 filters and 19 layers of bottleneck.MobileNetV2 detailed structure is described by M. Sandler [25].The inverted residual bottleneck layers allow the system to have a particularly efficient memory, which is very important for applications.A standard efficient implementation of inference is used in Tensor Flow [19] or Caffe [20].The computation is scheduled to minimize the total number of tensors that needs to be stored in memory.In most general cases, it searches over all plausible computation orders Σ(G) and picks the minimum one.where: R(i, π, G) is the list of intermediate tensors that are connected to any of πi. . .πn nodes, |A| represents the size of the tensor A, and size(πi) is the total amount of memory needed for internal storage during operation i.For graphs that have only trivial parallel structure (such as residual connection), there is only one nontrivial feasible computation order, and thus the total amount and a bound on the memory M(G) needed for inference on compute graph G can be simplified:

𝑀(𝐺) = 𝑚𝑖𝑛
Following [25], the amount of memory is simply the maximum total size of combined inputs and outputs across all operations.It means we recognize that if we treat a bottleneck residual block as a single operation (and treat inner convolution as a disposable tensor), the total amount of memory would be dominated by the size of bottleneck tensors, rather than the size of tensors that are internal to bottleneck (and much larger).In a Tensor Flow graph, each node has zero or more inputs and zero or more outputs, and represents the instantiation of an operation.Values that flow along normal edges in the graph (from outputs to inputs) are tensors, arbitrary dimensionality arrays where the underlying element type is specified or inferred at graph-construction time.For small applications, reducing memory while ensuring image quality is great.However, when we abuse this, it can easily lead to instability in image processing, such as reducing image quality, which relates to the marginal limit of total memory capacity.In this paper, we proposed a solution to always maintain the boundary of the total memory capacity in the following robust bound of OP as (7) as follows: Similar with Then, for hybrid SSD and MobileNetV2, we replaced all regular convolutions with separable convolutions in the SSD network's predictive classes [2] to reduce the number of parameters and help the model decrease the amount of total memory capacity as showed in (8) but still maintain the boundary of computing steps.In particular, the output is labeled with the object and the confidence level is in percentage terms.In the experiments of this paper, the improved SSD-Mobile Net V2 method also showed higher efficiency than the method of [25] especially when applied to the BNWAS.

APPLYING CNNs TO DESIGN THE BRIDGE NAVIGATIONAL WATCH AND ALARM SYSTEM 3.1. BNWAS design based on regulations of IMO MSC. 128 (75)
In recent years, it is known that ships usually perform under the complexity and vulnerability of environment, so that the challenge of ship development remains an problem of significant advancements from researchers.They have been paid attention to study of ship [27][28][29][30] to meet the IMO standards.Recently, the authors [18] have studied and applied the modified SSDLite_MobileNetV2 hybrid algorithm to BNWAS by using the hardware based on raspberry Pi-3 to meet the requirements of IMO MSC.128 (75) and SOLAS Chapter V, Reg.19 MSC.282 (86) [23] revised on June 5, 2009 [20] valid for ships classified by size: − July 2011: new vessels in excess of 150 tonnes.− July 2011: all passenger vessels.− July 2012: all vessels in excess of 3,000 tonnes.− July 2013: all vessels between 500 and 3,000 tonnes.− July 2014: all vessels between 150 and 500 tonnes.
BNWAS is a monitoring and Alarm system which notifies other officers or captains if the officer on watch (OOW) does not respond or he/she is incapable of performing the watch duties efficiently which can lead to maritime accidents.The system monitors the awareness of the officer of the watch (OOW) and automatically alerts the Master or another qualified OOW if for any reason the OOW becomes incapable of performing duties.This is achieved through a mix of alarms and indications which alert backup OOWs as well as the Master.BNWAS warnings are given in the case of incapacity of the watchkeeping officer due to accidents, sickness or in the event of a security breach, e.g.piracy and/or hijacking.Unless decided by the Master only, the BNWAS shall remain operational at all times.Outputs of the system should be available for connection of additional bridge visual indications, audible alarms and remote audible alarms as in [9].The applied to the actual system design in Figure 2  To compare the effectiveness of the solution with other applications based on hardware and practical conditions in the bridge of the Saigon Millennium Ship, we deployed four solutions to get results.In this work, we focused on two factors, including processing speed and output reliability to apply object detectors on the designed system by using the modified SSDLite_MobileNetV2 bounded CNN algorithm.

Testing the designed BNWAS on Saigon Millennium Vessel in Saigon River
The image has been recorded from Saigon Millennium ship at Son Hai Shipyard, Ho Chi Minh City, Vietnam.This image is captured through the logitech C270 camera and processed by hybrid network-based object identification algorithms SSD-Mobile Net V2.The output is the processed image extracting the detected object frame and the reliability calculated as a percentage.With the technique used in this paper, the system can identify many officers in the bridge and the maximum number of accesses to the detection frame is 20 people at a time.When identifying officers in the bridge, the system allows customized functions via the touch screen or push-button on the bridge.Testing the designed BNWAS on Saigon Millennium Vessel in Saigon River as belows: − Case 1: if the system determines that there is no officer in the bridge, a timer will be turned on and the countdown time will wait for the officer to appear.During the active timer period, the function of switch modes and countdown timer are disabled.If during the countdown, there is an officer in the bridge (no physical impact is needed on the system), the timer is reset and the system returns to its normal state, officers can operate and use the system function keys.− Case 2: if no officer returns and the timer has counted to zero (timeout), a flash warning signal will be activated in the bridge; this stage is called the primary alarm stage.This signal can be seen anywhere in the bridge and in accordance with IMO standards.On the display screen, the alarm level will appear, and all system parameters will be saved to the history file, then a next timer is started to move to the next alarm stage.Subsequent alarm tests are tested and the final results are consistent with IMO requirements.Not only did the system recognize the officer presence in the bridge, it also analyzed the officers' actions and issued warnings when they found officers standing still for too long or sleeping while on duty.In experiment, the test detected an officer who sat in silence for too long or showed signs of drowsiness as in Figure 3.
The test was recorded when we asked an officer to sit silently on the driver's seat (at least 20 seconds) to see whether the officer stands still for too long or has a drowsiness.At the same time, an underground running counter will analyze the relative position of the officer and give a relative error.Based on the results of each frame analysis, after 20 seconds, if the relative position error does not exceed 10%, the primary alarm is set and the next alarm timer will start counting down.

Summary experimental results
Highly configurable models running on TITAN X GPUs produced processing speeds between 17 and 37 frames per second.However, when experimenting on COCO data sets and mAP calculations on all object classes, the results only reached 21-28%.Experimental results on processing speed on system were tested directly on the bridge with normal working conditions and the results were shown high performance from 76-97% as in Table 1.This impressive result is achieved when installing the camera in the bridge in a convenient position while the hardware is a mobile device with only ARM CPU and no integrated GPU.The highest processing speed is only approximately 1 FPS.The discuss of the experimental results focus more detail in Table 1.The result showed that 4 models tested on our hardware (raspberry Pi 3B+) using our method better than GPU TITAN X hardware (difference hardware) about speed (ms) and mAP.So that, the FPS speed of the test methods is indicated in Figure 4 and this is a good response rate for a monitoring system.
The output reliability is highest when tested with the faster RCNN detector, however with 0.08FPS (about 12.5 seconds to process a frame) it is not possible to meet on a monitoring system.Object detectors based on the SSD_MobileNet structure (in brown color) produce highly reliable results and meet processing speed requirements.Meanwhile, the result of SSD_MobileNetV1 (yellow) and SSD_MobileNetV2 (green) sets are almost equivalent, but the load time of the model is slow due to large capacity and actual output.There are still certain deviations.Thus, the improved SSDLite_MobileNetV2 solution gives good results relating to quality, processing speed, fast model load time (stable running on raspberry PI-3) and has higher accuracy than the other solutions.

CONCLUSION
In this paper, we studied and applied the modified SSDLite_MobileNetV2 bounded CNN algorithm to BNWAS-GTS.V1.The hardware was designed based on raspberry Pi-3, an embedded single board computer with CPU smartphone level, limited RAM without CUDA GPU.Experimental results on processing speed on BNWAS-GTS.V1 were tested directly on the bridge with normal working conditions.This impressive result was achieved when installing the camera in the bridge in a convenient position while the hardware used a mobile device.The improved SSD-Mobile Net V2 based on bounded CNN algorithm also showed higher efficiency especially when applied to the BNWAS.


ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 19, No. 1, February 2021: 244 -251 246 SSD Lite-Mobile Net achieve high accuracy while low computation time lies in the hybrid structure from SSD and MobileNet structure.SSD (single shot multi box detector) is an object detector (Figure1) that performs two main steps: extract feature maps of features (feature maps) and apply convolution filters (convolution filters) to detect objects.


ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol. 19, No. 1, February 2021: 244 -251 248 (a) and designing diagram is shown in Figure 2 (b).The connected computer works in tandem with raspberry Pi-3 (plays the role of the central processing board on Figure 2 (b) to collect input and output data of the testing process.Hardware is designed to perform alarm functions.

Figure 2 .
Figure 2. The designed BNWAS GTS.V1 system tested on HCM City University of Transport; (a) BNWAS-GTS.V1 system tested on HCM City University of Transport, and (b) Structure of designed BNWAS

Figure 3 .
Figure 3. Testing the designed BNWAS on Saigon Millennium Vessel in Saigon River; (a) testing no alarm stage, and (b) testing alarm stages

Figure 4 .
Figure 4.The FPS speed of the test methods; (a) compare processing speed of object detectors on BNWAS hardware and (b) compare the output reliability of object detectors on BNWAS hardware

Table 1 .
Testing performance results of 4 models in experimental