Detecting community on social networks with fast and optimal online clustering algorithms

Social networks or social media have become an essential part of our lives today, at least in their virtual dimension, and the image of the web world is almost impossible without the presence of this pervasive phenomenon. These networks are one of the important components of the information infrastructure, such as Twitter networks, Facebook networks, and so on. In the analysis of social networks, one of the important issues is the detection of community. Each community is a group of network nodes so that the connection between nodes within the group with each other is more than their connection with other network nodes. Various methods have been proposed for community detection. One of the existing methods is based on data stream clustering. The output data of a social network can be modeled with a data stream. Fast and accurate clustering of this data stream can be very effective in the detection of community. In order to increase the accuracy of the clustering process, the crow search optimization algorithm has been used. So In this research, using a fast and accurate online clustering algorithm, the community is detected. The simulation results indicate that the method proposed in this research can calculate the number of clusters optimally and perform better than similar methods. The proposed algorithm can be used in many other applications.


INTRODUCTION
Today, internet connection statistics and users are increasing exponentially.The fast spread of the internet, digital, and satellite technologies have made real-time communication between huge parts of the world possible.Consequently, several national information controls in different countries have become useless.Nowadays, the role of the media and their influence in the political construction of the world is not secret from anybody.Communication theorists believe that the world is in the hands of those who own the media.The key role of the media in influencing public opinion has caused the importance of the media to be considered to this extent.
Today, social networks are the rudder of the unsettled sea of the internet.Networks that are with virtual social orientation and based on technology play an essential part in the media equations of the world.These networks offer the chance to use several opportunities in the internet space, including reading, sharing, and searching news, uploading videos and photos, writing posts and membership in different groups, and  ISSN: 1693-6930 TELKOMNIKA Telecommun Comput El Control, Vol.22, No. 2, April 2024: 321-328 322 political mobility.This has caused internet users to favor social networks.Social networks are generally composed of organizational or individual groups that are connected through one or more sorts of dependencies.In the context of a complex information society, these networks show the effective functioning of the convergent network and its popularity and success.Their increasing number is because of their social color [1]- [3].
Social networks, or in other words, social media, are developed with data related to humans that are usually produced by them and often include their social characteristics.Social network analysis, which is sometimes abbreviated as SNA and sometimes called complex dynamic networks, means the process of examining and evaluating the structures of a graph of human interaction that are connected to each other by communication lines.In fact, social network analysis is the process of researching and examining social structures using network theory and graph theory [4], [5].
Graphs as mathematical structures that show the relationships of objects together at a suitable level of abstraction have been widely used in modeling various problems.For this reason, having suitable tools for their analysis has become a necessity and many researchers in various fields have provided methods for this work.One of the important analyzes performed on graphs is the clustering of graph nodes.The clustering of nodes in the graph is actually the same as the problem of recognizing graph communities, provided that the graph nodes correspond to the data points and the shortest distance between the nodes is considered to be the distance between the points.One of the most practical problems in graph-based social network analysis is the data clustering problem [6].
The purpose of clustering is to extract parts of the data that are very similar.In other words, in clustering, we split the data into groups in such a way that the data in each group have similar or identical characteristics, and the data from separate groups have diverse characteristics.For example, if we take into account people's political characteristics (regardless of how they are socially connected), the clustering results will ideally divide people into groups that each have similar political leanings, as well as people from different groups.They have less similarity in political attitude.Clustering methods are used in various applications, including data simplification, data analysis, data similarity measurement, and finding patterns.The problem of clustering in graphs is also called community detection.As shown in Figures 1(a There are many problems with community detection.One of these problems is the large amount of data.To put it better, the data generated from social networks is huge.So community detection is an NP-hard problem and has not yet been solved to a satisfactory level [7], [8].Therefore, recent research has led researchers in these fields to develop algorithms that have low computational overhead and high scalability.Our contribution in this research is to provide a fast and accurate algorithm for the detection of communities in social networks with large data volumes.The proposed algorithm is based on fast data stream clustering.In order to increase the speed of the proposed algorithm, the crow search optimization algorithm [9] has been used.Such a design helps a lot to create an intelligent algorithm with a low computational load.The rest of this study is as follows.In the second part, related works are reviewed.In the third part, the proposed method is presented.This method is based on the optimization of clustering parameters to create a fast algorithm.In the fourth section, the simulation results are presented.


Detecting community on social networks with fast and optimal … (Muneer Sameer Gheni Mansoor) 323

RELATED WORKS
Many researchers have been attracted to the stream clustering field in recent years.Thus, many algorithms have been proposed, for example: clustream [10], stream [11], dstream [12], and denstream [13].Some algorithms "transfer" the conventional algorithms of clustering to the scenario of the data stream; for example the stream algorithm [11] embraces the k-means version for data streams.On the other hand, some algorithms, such as clustream [10], propose methods that are designed specifically for data streams.Many surveys exist in the related literature due to the interest in this field.Early research by Guha et al. [14] presented an overview of the field and its challenges.There are also more recent surveys, like the research of of Aggarwal [15] and research of Silva et al. [16].A survey by Amini et al. [17] concentrates on density-and grid-based stream algorithms for clustering.Gu and Angelov [18], AD_clustering is presented, which is a novel autonomous data-driven clustering method for processing live data streams.This noval proposed method is entirely based on the data samples and their ensemble properties and it is fully unsupervised; this means that there is no need for problem-specific or user-predefined parameters and assumptions, which is a problem for most of the current clustering methods.Some deep learning methods for community detection can be reviewed in [19]- [21].In short, due to the importance of clustering in data analysis and the extensive use of graphs in problem modeling, the clustering of graphs has been particularly noticed by researchers, and various methods have been presented for it.Since researchers from different disciplines have worked on this issue, there are different approaches to it.But the generality of most of these methods is to find subgraphs that have many internal connections and few external connections.In this research, a fast and accurate algorithm for the detection of communities in social networks with large data volumes is presented.This method is actually an improvement on the [18] algorithm by intelligently choosing parameters for big data.

PROPOSED METHOD IN COMMUNITY DETECTION
The proposed method of this research is suitable for big data.In the problem of detecting communities, it is tried to put nodes that have a lot of connection with each other in one community and nodes that have little connection with each other in separate communities.Graph representation algorithms, similar to community detection methods, try to provide a two-dimensional or three-dimensional image of the graph in which the nodes that are connected to each other are located in the same region and the points that are not connected to each other are located in distant regions.We assume that the data received from the social network is modeled in the form of a data stream (1).
{_1, _2, _3, _4, _5, … , _, … } The distance between these data is also called (_1, _2).Using the euclidean distance, the shortest distance between two points is calculated according to the pythagorean relation.If the absolute value of the distance between the components of the points is used instead of the square of the distance between the components, the distance function is called manhattan.A more general form of euclidean and manhattan distance for convex or convex shapes is called minkowski distance.According to the nature of multivariate data, a suitable distance should be used in each node.If the data are continuous in the form of a multivariate distribution, using the euclidean distance gives the best results, but if there are outliers, the manhattan distance is suitable for quantitative data.One of the appropriate methods for clustering the data stream (1) is presented in [18] Figure 2. The input of this algorithm is data 1, and its output is cluster d, which we model with relation (2).
The proposed method in [18] is capable of recursively updating its self-defined parameters using only the current data sample meanwhile discarding all the previous data samples, and it also evolves its structure automatically depending on the experimentally observable streaming data.Figure 3 shows an example of the process of forming new clusters.
As mentioned, there are many problems with community detection.One of these problems is the large amount of data.To put it better, the data generated from social networks is huge.The integration process in [18] is time-consuming for large data, and this makes the algorithm go out of real-time mode.With a proper definition of the problem and the selection of fixed parameters, the optimal form, and using the crow search algorithm, this algorithm can be customized for large data.The crow search algorithm [3] is the basic algorithm  CSA) is a population-based technique that works on the idea that crows store their surplus food in secret places and retrieve it when food is needed [19]- [29].According to this algorithm, crows determine their new and updated positions in the search space.For this purpose, each crow randomly chooses one of the crows in the flock (for example, crow j) and follows it to discover the location of the food hidden by this crow (mj) (this process is repeated for all crows).The new position of crow i is obtained using (3).Various parameters can be optimally selected with the crow search algorithm.In this study, the value of  chosen intelligently and optimally.^(,  + 1) = { , +   ×  , × ( , −  , )   ≥  ,    ℎ (3)

SIMULATION OF THE PROPOSED METHOD
MATLAB2022 is used to simulate the proposed method.In order to measure the accuracy and speed of the method, large random data has been generated in large dimensions.In our proposed method,  −  −   −  parameters are selected by default.Table 1 shows the selected parameters of the crow search algorithm.Figure 4 shows the clustering results for 20,000 random data.The figure on the left shows the output cluster of the algorithm without the CSA algorithm.In this case,  is equal to 2 (default), and the code execution time is about 5.44 seconds.The number of clusters is equal to 6.The figure on the right shows the output cluster TELKOMNIKA Telecommun Comput El Control  Detecting community on social networks with fast and optimal … (Muneer Sameer Gheni Mansoor) 325 of the algorithm in the presence of the CSA algorithm.In this case,  is equal to 2.35, and the code execution time is about 2.51 seconds.The number of clusters is also equal to 3. Figure 4 shows the results of the proposed method for 20,000 data, left: without -optimization, right: with -optimization.One of the goals of this research is to solve the problem of increasing data.Figure 5 shows the clustering results for 100,000 random data.The figure on the left shows the output cluster of the algorithm without the CSA algorithm.In this case,  is equal to 2 (default), and the code execution time is about 17.82 seconds.The number of clusters is also equal to 3. The figure on the right shows the output cluster of the algorithm in the presence of the CSA algorithm.In this case, n is equal to 2.15, and the code execution time is about 7.51 seconds.The number of clusters is equal to 2. Figure 5 shows the results of the proposed method for 100,000 data, left: without -optimization, right: with -optimization.It is clear from the results that a smart choice of n can make clustering about 2.5 times faster.The proposed method shows its ability with the increase of data.In this case, the time will be significantly reduced.Obviously, in this case, the number of outlier data will increase, but the accuracy of the proposed algorithm will not decrease.Therefore, the proposed method is very well customized for big data.The idea of this research can be expanded further in the future.One of the interesting ideas is to choose other parameters to optimize and find a better algorithm.These parameters can be different distance criteria or smart thresholds.A summary of the results is presented in Table 2.

CONCLUSION
During the last decade, more and more attention has been paid to communication in modern society.Today, social networks with hundreds of millions of members are considered a powerful tool to guide the flow of information.Therefore, the study of different aspects of these networks has been considered by researchers.One of the important issues in social network analysis is community detection.Clustering is used to identify communities.There are many problems with community detection.One of these problems is the large amount of data.To put it better, the data generated from social networks is huge.Therefore, recent research has led researchers in these fields to develop algorithms that have low computational overhead and high scalability.Our contribution in this research was to provide a fast and accurate algorithm for community detection in social networks with large data volumes.The proposed algorithm is based on fast data stream clustering.The crow search optimization algorithm was used to increase the speed of the proposed algorithm.Such a design helps a lot to create an intelligent algorithm with a low computational load.The simulation results showed that the proposed algorithm works about 2.5 times faster.This customized algorithm can solve the clustering problem of sparse data with further improvement in future research.

Figure 1 .
Figure 1.The detection uses users' similarity in online activities (topology) and their profiles (attributes) for: (a) a graph where nodes represent users in a social network and (b) two communities based on the predictionof users' professions[5]

Figure 2 .Figure 3 .
Figure 2. The clustering algorithm transforms the data stream into specific clusters

Figure 4 .Figure 5 .
Figure 4. Results of the proposed method for 20,000 data, left: without n-optimization, right: with -optimization

Table 1 .
Selected parameters of the crow search algorithm