Large Crowd Count Based on Improved SURF Algorithm

This paper uses an analysis of Speeded up Robust Feature (SURF), based on the method of Linear Interpolation for camera distortion calibration, for high-density crowd counting. The eigenvalues are built on the Gray Level Co-occurrence Matrix (GLCM) features and the SURF features. Through the method of linear interpolation, weight values are interpolated to reduce the error, which is caused by camera distortion calibration. The optimized crowd’s feature vector can be got then. Through the method of support vector regression, the crowd’s number can be forecast by training model. The experiment result shows that the method of this paper has a higher accuracy than the previous methods.


Introduction
With the increase of the world population, unfortunate accidents in public places caused by high-density crowd occur frequently in recent years.At the same time, the video surveillance systems are ubiquitous [1].If we make use of the existing resources, these intelligent systems can help us effectively forewarn and avoid disaster events.Compared with the traditional approach, the intelligent system of counting and density estimation can also improve the utilization rate of public facilities, and arrange the allocation of manpower and material resources effectively.
The algorithm of crowd counting can be divided into two categories: direct and indirect means.The direct way utilizes people's characteristics directly, such as color, shape, etc, to get the crowd's number.The people's head and face and some other characteristics can be selected as the statistical feature vector.This method is usually very complex, and is more suitable for monitoring low density populations.The major research method of counting highdensity crowd in the world is the indirect way [2].With this method, the number can be obtained by the method of regression through extracting the whole crowd's features [3].But the statistical precision of this method is currently not accurate enough.The method still needs further indepth research.
This paper uses a statistical regression method.First, the crowd's foreground image is extracted from the input image.Then, the GLCM features and SURF feature of crowd's foreground image are extracted [4]- [7].Though the method of linear interpolation, weight values that caused by camera distortion calibration, are interpolated to reduce the error.Through the method of support vector regression, the crowd's number can be finally forecast by training model.Figure 1 shows the block diagram of whole algorithm.

SURF Feature Extraction
The research object of this paper is high-density crowd.SURF algorithm is used to describe the characteristics of population.
In 2006, Herbert Bay proposed a more practical feature detection algorithm of SURF.SURF is a local feature point detector with high robustness, and the operating speed of this algorithm is higher.Because of its good invariance of scale transformation and perspective transformation, it has become an important feature extraction algorithm in many ways.
The sum of all pixels' gray value in any window W can be obtained by the value of the four points   , i j ,   (2) The extreme points of scale-space can be got through Hessian matrix approximation.These extreme points are the feature points of what we need.
The Hessian matrix   x, H  of point   x, y in the image I is defined as follows: The weight w changes with the change of scale.
The feature points should be further confirmed after preliminary testing.In order to verify the extreme points in the scale-space, each sampling point should be compared with all its adjacent points.In other words, each point is compared with 26 points, which means those 18 points in the adjacent scale-space and 8 points in the same image.If the point is greater or less than these 26 points, it is the final feature point.
(3) The principal orientation of each feature point is determined.After that, the 64-dimensional characterization vector is formed.
In this paper, the first processing of SURF feature points extraction is that the binary foreground image is extracted from the input image by the background subtraction method and the sliding average method.Then the SURF feature is extracted from the binary foreground image.Compared with the SURF feature extraction of the overall image, the SURF feature extraction of binary foreground image reduces the computation complexity.Figure 3 shows the result of SURF feature extraction.The white part of binary foreground image is the moving region.According to the principle of the SURF algorithm, most of the SURF feature points are in the motion area, but there will be a few feature points around region [10].As shown in Figure 3, the number of feature points cannot effectively reflect the characteristics of crowd.So the feature points in the non-interest region should be rejected.This selecting process only needs to scan all feature points, which are distinguished by their value of pixel.
, surf x y is 1, this point will be kept; when it is 0, this point will be removed.  , i x y is the pixel values of this point.In Figure 4, we can see that the kept SURF feature points can reflect the characteristics of crowd really and effectively.

Eigenvector Construction with SURF Feature and GLCM Feature
SURF feature has good invariance of scale transformation and perspective transformation, and can reflect the characteristics of crowd.But for large populations, and when there is a dense covering, the SURF feature cannot reflect the characteristics very well.Because the GLCM features can effectively overcome the occlusion problem, this paper proposes combining the GLCM feature with SURF feature.The eigenvector is composed of four uncorrelated GLCM feature vectors (energy, entropy, contrast, correlation) and the numbers of SURF feature points [11].
The four uncorrelated GLCM feature vectors are: (1) Energy: A statistic reflects the consistency.
Energy reflects the level of texture coarseness and the uniformity level of gray distribution.When the texture is coarse, the energy is high.Otherwise, the energy is low.

   
1 1 2 0 0 , , The contrast reflects the sharpness of the image.When the texture is coarse, the contrast is small.Otherwise, the contrast is big.
(3) Entropy: A parameter calculating the randomness distribution of gray-level d .


   , , log , , Entropy indicates the level of non-uniformity texture or the complexity of the image.When the texture is coarse, the entropy is small.Otherwise, the entropy is large.
The homogeneity reflects the direction of the texture, and shows the similarity degree of rows or columns.The difference of pixel values between elements is bigger, the homogeneity value is smaller.
Through the above theory, 6-dimensional feature vector is formed in this paper, and the SURF feature is the main characteristic: (num surf , s, feature entropy , feature energy , feature contrast , feature homogeneity ).num surf is the number of SURF points.s is the area of moving people in binary foreground image.feature entropy is the entropy of GLCM matrix.feature energy is the energy of GLCM matrix.feature contrast is the contrast of GLCM matrix.feature homogeneity is the homogeneity of GLCM matrix

Linear Interpolation Weights for Camera Distortion Calibration
Camera distortion calibration is caused by the increasing distance between moving objective and the camera.As we all know, the area of people near camera is bigger than the area of one far away from camera.In order to reduce the influence caused by the losing depth information of image, this paper adopts the method of linear interpolation weights for camera distortion calibration.This method has strong adaptability and high real-time.And in actual application, the researchers do not need measurement of the environment.In Figure 5, the image is divided into four regional grids.The width and height of object's minimum enclosing rectangle (MER) can be got in the entrance and exit of each grid.The weights of each grid can be obtained by the area change rate of the MER.As shown in Figure 5, the area change rate of grid a, b, c and d is as follows: , h w is the width and height of object's MER in the entrance.  , h w is the width and height of object's MER in the exit.
Figure 6 shows a video sequence of single people walking in PETS video sequences [12].This paper separates the monitor space into 4 parts in each video frame.The purpose is to improve the accuracy of camera distortion calibration.It should be noted the accuracy of calibration is more accurate when the more parts separated.But the more weights interpolated, the processing of determining these weights is more complex.Figure 7 shows obvious differences of SURF number before and after correction.The abscissa represents the number of frames, ordinate represents SURF numbers.Figure 7 (a) shows that the SURF number reduces gradually as the pedestrian walks away from the camera gradually.Figure 7 (b) shows that the SURF number remains in a stable range after interpolating the four weights for correction.
Figure 8 shows difference of foreground area before and after correction.The abscissa represents the number of frames, ordinate represents the foreground area [13].Figure 8 (a) shows that the foreground area reduces gradually as the pedestrian walks away from the camera gradually.Figure 8 (b) shows that the foreground area remains in a stable range after interpolating the four weights for correction.
Figure 7 and Figure 8 show that the method of linear interpolation weights can solve the problem of camera distortion rapidly and effectively.

The Test Results and Analysis
This experiment uses Microsoft Visual C++ 6.0 as software development environment and OpenCV1.0 as image processing library in the operating system of Windows XP.Hardware experimental platform is a PC machine, and the PC memory is 2G.This experiment uses Matlab7.0 as the analysis tool for the conclusion and analysis.
The number of crowd can be estimated after ε-SVR training.The training model can be got through training the correction feature vectors.This paper tests three video sequences in the PETS2009 video library [12].Figure 9 shows the curve of the real crowd's number, the test crowd's number without correction and the test crowd's number after correction.

   
n is the frames of video.  N i is the test number of frame i .  N i is the real number of frame i .
Experimental results before and after corrections of this paper are shown in Table 1.

873
Compared with the non-correction method, the accuracy after correction is much higher.It proves that the method of linear interpolation weights can solve the problem of camera distortion rapidly and effectively.In Table 2, the GLCM method is the crowd counting method based on GLCM features.The crowd feature eigenvector of this method is only made up of the 4 GLCM feature vectors.The SURF method is the counting method based on SURF algorithm.The main vector of crowd feature eigenvector is SURF number.
The results analysis shows that this method can estimate the peoples' number in test region rapidly and accurately.Compared with the GLCM method, the accuracy of this paper increases obviously.It shows that SURF number is an important feature vector of counting crowd.Compared with the SURF method, the accuracy of video 3 increases obviously.It shows that the GLCM feature vectors can effectively overcome the occlusion problem.

Compared With The Pixel Statistic Feature Method
In many cases the pixels statistic feature can describe the population characteristics effectively, mainly including foreground feature and edge feature [15].By introducing the perspective correction parameter, it can calculate the weight ratio calculation parameters.It can be seen that the SURF algorithm method is better than the pixels statistic feature method.Overall, the deviation of experiment results of the pixels statistic feature method is bigger, while the SURF method which keeps the experiment results curve around the actual number curve all the way is more precisely.

Conclusion
According to the problem of counting high-density crowd, this paper proposes an improved crowd counting method based on SURF algorithm and GLCM algorithm.Experimental study finds that, the linear interpolation weights correction method is a simple and effective method for camera distortion calibration.This algorithm has strong adaptability, and can accurately estimate the number of people in each frame with the average error less than 2 people per frame.With the variety and complexity of the research environment, the method still needs further in-depth research.

Figure 1 .
Figure 1.The block diagram of whole algorithm

Figure 2 .
Figure 2. The calculation of integral image and the sum of all pixels' gray value in window W

Figure 3 .
Figure 3. SURF feature extraction of binary foreground image

Figure 4 .
Figure 4.After rejecting feature points in non interest region

Figure 5 .
Figure 5.The theory of linear interpolation weights

Figure 8 .
Figure 8.The correction effect of foreground area

Figure 9 .
Figure 9.The test result and analysis Based on Improved SURF Algorithm (HainingZhang)

r
is the foreground pixels or edge pixels of the pixels statistic feature, i w is the impulse response function,   i  is the perspective correction parameters of pixels.When it is foreground or edge pixel, the value is 1, otherwise the value is 0.Through the comparison of the SURF algorithm and the pixels statistic feature method used in video 3, the results are shown in the Figure10.

Figure 10 .
Figure 10.The result of comparison [9]the local minimum point.The feature point can be judged through computing the determinant of each pixel in the image[9].For the convenience of applications, Herbert Bay proposed approximating second order derivatives with box filters.Using the box filters operation to replace the convolution operation L .The different scales of scale space are formed by expanding the size of the box.
x D to replace L , then the determinant is simplified as:

Table 1 .
The test results analysis before correction

Table 2 .
The results of GLCM method