Human activity recognition for static and dynamic activity using convolutional neural network

Evaluated activity as a detail of the human physical movement has become a leading subject for researchers. Activity recognition application is utilized in several areas, such as living, health, game, medical, rehabilitation, and other smart home system applications. An accelerometer was popular sensors to recognize the activity, as well as a gyroscope, which can be embedded in a smartphone. Signal was generated from the accelerometer as a time-series data is an actual approach like a human actifvity pattern. Motion data have acquired in 30 volunteers. Dynamic actives (walking, walking upstairs, walking downstairs) as DA and static actives (laying, standing, sitting) as SA were collected from volunteers. SA and DA it's a challenging problem with the different signal patterns, SA signals coincide between activities but with a clear threshold, otherwise the DA signal is clearly distributed but with an adjacent upper threshold. The proposed network structure achieves a significant performance with the best overall accuracy of 97% . The result indicated the ability of the model for human activity recognition purposes.


INTRODUCTION
Human activity recognition (HAR) is the use of knowledge and image models for modeling activity and sensor data [1]. Human activity recognition has the availability of complete sensors so that they can recognize human activities such as sitting, walking, sleeping, running, and standing. HAR can also be used as a tool to diagnose a disease [2], activity recognition [3], [4], and be used in the military field [5]. A pioneer in HAR research using an accelerometer was published in the 90s [6]. However, the most widely cited research was able to produce satisfying results with many sensors simultaneously and using various algorithms carried out by Bao and Intile [7]. Classification of the introduction of human activities using sensors that vary from the device is a classic problem. It is, therefore, important to find a method for the proper recognition of human activity from device sensors [8].
HAR using smartphone sensors is a classic multi-variate time series classification problem, which utilizes 1D sensor signals and extracts features to be able to recognize activities by utilizing classification. Very little research on HAR uses in-depth learning techniques and automatic feature extraction mechanisms. The latest breakthrough in image and sound recognition has resulted in a new field of research that attracts enthusiastic researchers called deep learnin [9]. The convolutional neural network (CNN) neural network, in 1858 particular, is a suitable algorithm for image and sound recognition. But not only images and sounds can be processed by CNN but the HAR dataset is also a good implementation when processing it using time-series data types.
Previous studies used HAR from various types of sensors except for cameras and accelerometers, gyroscope sensors using electromyograph, infrared audio, and other sensors [10]. An accelerometer has several advantages, low grass, cheaper. With the small dimensions embedded in a smartphone, the accelerometer can easily measure human movements. It can be used in a variety of different positions such as arms, waist, head, shoulders, pockets [11].
Fuentes and colleagues in a study entitled "online motion recognition using an accelerometer in a mobile device" uses neural networks in recognition of human body motion [12], while khan uses the decision tree method in recognizing human body movements from Wii remote data [13]. Other studies using a combination of CNN and machine learning methods appear in Table 1. One of the researchers using the University of California Irvine (UCI) HAR dataset which has 128 features is Ronao in 2016 in Table 2 in his study entitled "human activity recognition with smartphone sensors using deep learning neural networks" produces an accuracy value of 93.75% using the CNN and multilayer perceptron (MPL) algorithm [14].
In this research work, we proposed convolution neural network approaches for human body motion recognition with static activity and dynamic datasets. The main contribution of this paper is a model of CNN achieved good accuracy compared to previous research. The model provided variable parameters that match and have a high accuracy value for the dataset dynamic actives (DA) and static actives (SA). Combination of DA and SA models to classify HAR with 6 classes. The remaining section of this paper is organized as follows: materials and methods, data acquisitions which the information about proposed methods are given in section 2. The obtained results and discuss how the proposed method solved the problem are given in section 3. The conclusion about problem results is given in section 4.  [22]. This dataset was obtained from 30 volunteers who carried out various activities by wearing their waistline while doing six activities (standing, sitting, laying, walking, walking downstair, walking upstairs) Figure 1. The gadgets used in data collection can record the activity using the help of the accelerometer and gyroscope sensors that have been installed in the gadget, on the side of the accelerometer and gyroscope data collection process recorded using the video to label it manually. By using the gyroscope and accelerometer sensors in the gadget, they obtain three-axis linear acceleration (XYZ) data from the accelerometer sensor, and the gyroscope sensor generates three-axis angular velocity (XYZ). The sensor signal is then processed using noise filters and then in the sample in fixed containers (sliding windows) at intervals of 2.56 seconds with an overlap of 50%. The processed dataset is divided into 70% as training data, and 30% of test data is shown in Table 2.
Signal data from dynamic and static activities has a very significant difference, as seen in Figure 2 with 6 static and dynamic activities. Figure 3 (a) shows that there is a problem that occurs in the HAR that is the similarity of static signal data with standing and sitting activities. The similarities of standing and sitting activity data result in deep learning errors in classifying activities, and this can lower the level of accuracy in the overall HAR processing. In this article, we use t-SNE, which can display a high-dimensional data spread by reducing its dimensionality to two dimensions. We use configuration 1000 iterations and perplexities 2, 5, 10, 20, and 50 Figure 3 (b). Figure 3 (b) result t-SNE with perplexity 2 and 5 using 1000 iteration shows all group activity with the same type of data, but there are standing and sitting activities that have the same type of data.

Hyperparameter
Hyperparameter is a method in the neural network that allows users to obtain a combination of parameters that have the best accuracy value from a number of previous neural network computing steps [23]. The combination of parameters obtained by using hyperparameter includes the number of layers used, the mapping feature, size convolution filter, size pooling dataset [24]- [30]. The parameters used on the proposed CNN model before tuning were seen in Table 3.

RESULTS AND DISCUSSION
Evaluation of predicted results from each model using a confusion matrix. The confusion matrix is a method used to perform accuracy calculations on a predictive system. Confusion matrix contains actual information and predictions on the classification system. To find accuracy, precision, and recall sequentially using (3)-(5).
In this article, the author performs testing using CNN on the HAR dataset into 2 classes, namely the dynamic class and the static class with the parameters shown in Table 4. The use of hyperparameter tuning to get the best parameter combinations generates the highest accuracy on each static and dynamic dataset viewable in Table 5. Hyperparameter can provide the configuration of the parameters needed for CNN models of the selected dataset by randomly creating a combination of parameters. On the first layer, the hyperparameter will select the filter values between (28, 43, or 42), similarly the values of the kernel size, max pooling id, batch size, epoch, and dense parameters. While the configuration of the dropout parameter will be determined using a value between 0.45-0.7. The optimizer will be used between Adam and RMSprop with a value between 0.00065-0.004. Hyperparameter tuning is executed by the number of models to be generated as many as 100 models. The overall configuration can be seen in Table 4. 100 combinations are executed using Hyperparameter tuning, SA datasets get an accuracy of 97% in data train and 96% in the validation data shown in Figure 4, while the accuracy of the DA Dataset generates an accuracy value of 100% on the data train and 97.4% in the validation data shown in Figure 5.  Models obtained from three datasets are combined into the main model using the divide and conquer methods that can identify all human activities (walking, walking up, walking down, standing, laying, and sitting). This model generates an accuracy of 98.3% in data train and 97% in validation data. The model architecture IS seen in Figure 6. The comparison of this study with previous research titled "Human activity recognition with smartphone sensors using deep learning neural networks" [14] using the same dataset as well as tuning of the learning rate=0.006 resulted in an accuracy of 93.75%. In comparison, the proposed CNN model achieved a 97% accuracy can be seen in Table 6.  Table 6. Comparison of previous studies Method Accuracy HCF+NB [14] 79.43% HCF+J48 [14] 82.62% HCF+ANN [14] 82.27% HCF+SVM [14] 77.66% Convnet [14] 93.75% CNN Proposed method 97%  Human activity recognition for static and dynamic activity using … (Agus Eko Minarno) 1863 Figure 6. SA and DA combined models

CONCLUSION
This paper proposed a human activity recognition using CNN with 6 classes (walking, walking upstairs, walking downstair, sitting, standing, and laying). Based on the results and discussion, the divide and conquer method combine with CNN tuning hyperparameter in each category of datasets achieved 97% accuracy. The proposed model also solved the similarity problems with confusing static (sitting and standing) datasets. The highest accuracy reached 100% in the study was shown on the activities of walking downstair and laying, while the lowest accuracy of 94% is owned by the activities of sitting and standing.

ACKNOWLEDGMENTS
Thank you to the Informatics Department and Informatics Laboratory, Universitas Muhammadiyah Malang which has become a place for researchers to develop this journal research. Hopefully, this research can make a major contribution to the advancement of technology in Indonesia.