Augmented Reality Prototype for Visualising Large Sensors ’ Datasets

This paper addressed the development of an augmented reality (AR) based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI) on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.


Introduction
Interactive data exploration and analysis relies heavily on visualisation tools to assist any user in discovering and interpreting hidden patterns within a large dataset.Scientists and engineers find it easy to identify patterns existing in long data streams by using visual cues such as shapes and colours [1].Despite exponential increase in computing performance, the capabilities for visual analysis are still being challenged by the scale of data acquisition enabled by modern sensing and simulation technologies.The vast amount of data generated by modern sensors tends to make most oil companies data-rich but information-poor.A computerised system that provides users with easy access to the sensors datasets captured at various instances on the pipeline network and thet could make identification and localisation of leakages more efficient is therefore necessary.Such a system is intended to locate the necessary data and significantly reduce the time needed to spot 'danger zones' on the pipeline network.There are many of such software systems that support localisation and identification of leakage points along the network popularly called the leakages detection systems (LDSs), example of embedded technologies in LDSs includes the volume or mass balance, the rate of change in flow or pressure, the hydraulic modelling, the pressure point analysis and the ATMOS PIPE currently being used by Shell.Added to this, some private companies provide customised software solutions along with necessary hardware devices so that the oil companies can effectively monitor pipeline network performance.Earlier methods employed for detection is by manual inspection which has been found human-intensive and cost ineffective.This paper presents a prototype of AR to automatically detect leakages on the pipeline network and superimposed the computer-generated data onto a visual display devise that can help the oil companies to reduce extra time used for such identification and locaisation of leakages along a network of known dimension without a compromise of data reliability.
Researchers over years have tried to develop AR-based applications to support various types of datasets in several domains such as in architecture, engineering, and education.For example, [2] developed a method for visualizing terabytes texture data from digital photographs and latter presented tools for analysing the data in the real time using the Atelier3D framework.This framework, principally used by many visualisation applications such as in movies was developed over many years, it became popular because many tools used to analyze 3D data are dispersed and mostly sometimes are inappropriate for the specific context of large datasets like the ones obtained from sensors.For most sensors' datasets, development of model is usually an accepted approach.These models usually have relatively simple structures, so, visibility culling of the data is rarely the main issue [2].Model simplification by using quadric error metrics [3] was a popular approach, other methods such as that which renders hierarchies of points instead of meshes by [4] and the meshes vertices order technique by [5] were also developed, although this has been found disadvantageous when the rendering quality is of utmost significance [6].The main challenge in large data visualization therefore is to be able to adapt in real-time the resolution of the rendered data to the resolution of the display [2].
Visualization techniques generally are focused on reducing the voluminous data into manageable 'data figments' displayable by common display devises and allows extensive exploration without a compromise of image resolution and or quality when it is desired.Visualisation of large datasets especially the ones that has a link with information browsing was developed by [7].However, because their model could only be used with static datasets, it is inapplicable for sensors' applications.In a position paper on sensor data visualization on outdoor augmented reality, [8] presented the visualisation of carbon monoxide (CO) sensor data as an example of visualisation approach that supports dynamic sensor data with inputs from a 'data importer' using the SiteLens system earlier proposed by them.The method employed involves the use of mobile sensors which can be used for displaying datasets as sensed on the scene in real time.Problems such as information overload, detecting the presence of hidden information and filtering visualizations are often the aftermaths of such visualisation method.Some researchers such as [9] and [10] had attempted solutions to the problem although the latter's approach was more of data selection and filtering.The benefits of this is real time interactivity and this is something the industries making use of pipeline to transport materials really need.In architecture, the PDA-based augmented reality integrating speech (PARIS) that allows interactivity with the system by voice was developed by [11].This paper presents a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI) on the network.

AR-Based Visualisation Approach
Sensors datasets are prone to errors for so many reasons which are either classified as systematic or random errors [12].Building AR visualisation system for sensors datasets require that a number of processes must be strictly followed.The sensors must be properly calibrated [13], the generated datasets must be screened for noises, inaccuracies, imprecision, emptiness and redundancies because sensors are not consistent in their measurement of the same phenomenon under the same conditions.Further, the 'clean' datasets must undergo dimensionality reduction [14], and the reduced datasets must be rendered using appropriate tools.ARToolkit® and the FusionCharts® has been popular tools for such visualisations.

ISSN: 1693-6930
Augmented Reality Prototype for Visualising Large Sensors' … (Folorunso Olufemi A.) 163 Many manufacturers and designers of sensors have developed techniques for cleaning and correction of error for wireless sensor datasets examples are found in [13] and [15].Some systems uses models of sensor datasets to accurately and efficiently answer wireless sensor network queries with defined confidence intervals [16].For example, TinyDB provides a declarative means of acquiring data from a sensor network.Application Level Events (ALE) defines an interface for building middleware for some applications; ALE defines concepts similar to what is described as temporal and spatial data granules.The Context Toolkit (CT) advocates an architectural approach to hiding the details of sensor devices [17] .Visual data mining VDM incorporates the users in data extraction process [18].There is also the regression method which was applied to sensor networks for inference purposes by [19] which was basically an approach that involves building and maintaining complex models.
Current hardware technologies have developed means of storing these vast amounts of sensors' datasets.Even if all the entire display pixels are used for displaying the datasets, only a limited fraction of this could be rendered which greatly limits the capacity for intensive visual exploration of the entire database.Hidden data patterns are a problem both to the data user and visualisation experts.Towards the wake of this last decade, emphasis on data exploration became a research problem, people now seek patterns in datasets more intuitively and wish to have certain level of interactions.Scholars have proposed techniques for representing datasets and interacting with them, [21]- [22] Immersive Virtual Reality (IVR) emerged as a way forward that gives any user the ability to interactively explore and understand complex phenomena by mapping the numeric datasets into virtual shapes, providing means to interact with these datashapes by manipulating their orientations and gaining insight into the hidden structures within the datasets during analysis [23].The limitation of the IVR technology is that it requires arbitrary vectors in the datasets before the transformation into the 3D glyphs [24].If the exploration of very large datasets must be successful therefore, it is necessary to integrate humans as part of the data analysis process.It will be important to combine the best features of humans and computers [14].Hence, the AR-based visualisation approach proposed in this paper seems to be a way forward.

Research Method
During a typical flow in pipelines the sensors captures a vast amount of data that represents the instantaneous situation of the flowing material.Naturally, computerised systems such as the leakage detection systems (LDSs) can help to visualise the flowing material but not with the associated data attributes.The data attributes and their respective orders are very important for such visualizations.Generally, some data points in real life have no inherent order while for others, order is a significant issue.All vectors of a typical pipeline sensor' datasets are ordered and aggregated, or simply ordinals.Sometime continuous and most of the time intimately intra connected with missing and erroneous data points (see example in Table 1) or what is described in this paper as outliers.The questions are: what combination of these datasets attributes represents what situation on the network?And which data attribute could be used as test attribute for such classification without compromising data integrity?To answer these questions data classification method using the decision tree algorithm proposed by [25]  As shown in Figure 1, the proposed data model supports five activities that occur in different phases during a typical visualization of sensor's flow datasets.The five activities are Data capture, Data cleaning and classifications, Dimensionality reduction, Data aggregation, and Data visualization.Data captured from various sensors positioned at different locations along the pipeline network are injected into the system directly.Data cleaning and classification begins by pruning the data of emptiness, repeated values and generating a decision tree for the data using the basic algorithm for inducing a decision tree by [25].To address the issues of uncertainties in the captured datasets in the developed interface, the uncertainty treatment algorithm (UTa-algorithm) that was developed earlier [26] is applied.Human involvement is noted as part of the third to fifth activities to reducing the data dimension, aggregating and visualising the ROI particular to a user.
Let S be a set of sensor data samples captured at different locations on the network and s i be the number of sample S in class C i , if there are C i distinct classes into which the data could be classified, then to compute the information gain by each sensor data, It is necessary to first compute the expected information needed to classify the given data.The information needed is defined by: ( Where p i = s i /s is the probability that an arbitrary sensor data belong to a class C i .Log base 2 has been used because the data are encoded in bits. Using the Table 1, (m=5) since there are five possible distinct classes/states.Let C 1 = Steady, C 2 = Leakage, C 3 = Turbulence, C 4 = Normal and C 5 = Nil states.Then from Table 1, it implies that C 1 = 1, C 2 = 3, C 3 = 3, C 4 = 3 and C 5 = 1.

163
The next thing is to compute the entropy of each sensor data attribute.Beginning with pressure and continuing with other attributes such as temperature, volume, flow velocity and the external body force.(Note that time has been ignored.Its impact will be discussed latter).
When pressure = "p ≤ 0.97", s 11 =0 , s 21 =1, s 31 =1, s 41 =0 and s 51 =0 .Therefore using eq.1,I (s 11, s 21,… s 51 ) = 1.000When pressure = "0.98 ≤ p ≤ 1.3", s 12 =1 , s 22 =0, s 32 =1, s 42 =3 and s 52 =0 .Similarly, I (s 12, s 22,… s 52 ) = 1.371When pressure = "p ≥ 1.4", s 13 =0 , s 23 =0, s 33 =0, s 43 =0 and s 53 =0 .Similarly, I (s 13, s 23,… s 53 ) = 0.000 Therefore, if the captured sensors data is partitioned based on the pressure attribute, the expected information needed E(p) could be computed using: With k=3 (the number of possible boundaries for data attribute pressure) and m=5 (the number of attributes).Therefore, Finally, the Gain in information from partitioning using pressure attribute is computed using: The Gains computed for other attributes are presented in Table 3. Data Attribute volume (v) with the greatest Gain (0.944) is then chosen as the test attribute for classifying the sensors data.This approach for data classification is tainted with problems of noises and outliers.Often, statistical means is often an approach to dealing with these problems [25].In the programming logic of the developed AR-based interface described in this paper, a slightly different approach -the Minimum Description Length (MDL) principle in [25] is adopted to prune the attributes simply because of its insensitivity to the sampled sensor datasets.

Results and Discussions
Using AR in visualisation is still in the development phase and such visualization solutions generally depend on design decisions.There are three things that must be in mind when designing an AR application for visualization; Combination of both the real and the virtual data, real-time data Interactivity and registration in 3D [27].One of the challenges in the modelling was to reduce inefficiency caused by the following factors: noises from external sources, sensors' measurement inaccuracies, random errors generated by the hardware, and computing imprecision that is included in the model.These exceptions or outliers are generally ignored during visualisation but their effects could become very significant if appropriate measures are not taken.One way forward to meeting this challenge is to develop an AR-based data visualization interface which can automatically detect selected location along the network and visualize only the necessary datasets relevant to the particular location.In this section, the interface requirements and the architecture that is required for such interface are also discussed.

Interface Requirement 1:
The visualization interface shall model the pipeline network system in a way that users can specify the overall length (an assumption here is that the pipeline has no bends or Ts although this is the situation in real life pipeline networks) of the entire network.Visualization of the sensors' datasets requires that the whole length of the network should be known and that it should be displayed and calibrated.In this model, a user specifies the length of the pipeline network to be model and the interface automatically displays and calibrates the pipeline in kilometres (km) as shown in Figure 2 a and b.

Interface Requirement 2:
The interface shall select and visualize only the necessary attributes-region of Interest (ROI) of the captured sensors' datasets.Various types of sensors generate different kinds of attributes.No single sensor device could generate all the required attributes, the user through the interface selects the attributes particular to the governing sensor device.The interface as shown in Figure 2(b) is designed to select only the required attribute for the visualisation.For a typical visualization, not all the attributes are needed the test attribute is simply determined by procedure "find_test_attribute" embedded in the program logic making use of the basic algorithm for inducing a decision tree by [25] discussed earlier in this paper.

Interface Requirement 3:
The interface shall be built in such a way that it will allow users' selection into of the ROI and interaction with the last three activities of the data model.In order to know the precise state of the pipeline network at a specified region, all the captured and TELKOMNIKA ISSN: 1693-6930 Augmented Reality Prototype for Visualising Large Sensors' … (Folorunso Olufemi A.) 167 processed data for the region must be rendered.To achieve this, the interface shall be structured to provide interactivity with the data reduction, data aggregation and the data visualization stages of the model.This will ensure that only the ROI is displayed and the rest data are 'hid' from the scene.The interface shall be made in such a way that it could detect and report the exact leakage location on the network.Leakages localisation is principal to the heart of this kind of visualization, the interface shall be built to use the selected ROI sensor datasets to effectively predict and localize the leakage point on the pipeline based on the available datasets.Automatic identification and localisation of the leakage point on the pipeline network can easily facilitate the process of correcting the pipeline network anomaly as well as guiding the user throughout the data exploratory process.Currently the methods available for   Because of the (maximum rows) limitation of Excel, the captured data are automatically and periodically handled by a data screening module before a data-spill occurs.The native limit set in the interface was 20,000 data inputs.The retrieved data are then processed and visualized by proper modules, building the necessary pipeline network models and superimposing in 3D AR space of the OpenGL-based rendering engine so that users can intuitively interact and with the entire network.

Results
A separate module 'leakage_detection_R1' is designed to report leakage datasets if one is encountered.Figures 3 aand b shows an illustration with pipeline length 290km.After the rendering of the calibrated pipeline and data captured through appropriate modules, the interface reports that a leak is reported at kilometre 218.06 (±1.05) from the pipe origin.Manual analysis of the data produced the error at kilometre 220km.The margin of the error is generally considered insignificant because repeated trials yielded results within the data boundaries (216.85,221.23).To clear any doubts, a separate module 'leakage_predictor_R1' activated by a button "Predict Leakage" on the interface to predict the leakage points based on the 'means of nearby points' commonly used by most statistical packages including the SPSS v 16 for predictions is incorporated.The 'leakage_predictor_R1' returned values within the range 217.20 to 219.30 which falls within the acceptable data boundaries for the specific set of data.

Conclusions
This paper presents a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI) on the network.The necessary interface requirements to support efficient data capture, rendering and visualization were also discussed.An AR-based data visualization interface can improve localisation of leakage points along the pipeline network as well as reducing the operational costs associated with 'pigging'.Using this AR-based approach improves access time to data without a compromise of data reliability.The interface to retrieve data stored only in 2D data formats using the native Microsoft Excel worksheets was also designed.
Application and incorporation of 3D data input formats and extension of this work to handling atmospheric and geo-data are the future directions of this work.

Figure 1 .
Figure 1.AR-Based data model for sensor datasets

Figure 2 .
Figure 2. Interfaces snapshots for capturing (a) and displaying (b) the pipeline length

Figure 3 .
Figure 3. Interfaces snapshots for reading the captured datasets from the excel worksheets (a) and displaying the localised leakage point (b) on the pipeline network.

Table 1 .
Showing typical reading extracted from a sensor-Velocity Volume Vane Thermo Anemometer manufactured by KOREC DIRECT®

Table 2 .
is applied.The data above is classified into 5 different possible states on the pipeline network which are: Steady, Leakage, Turbulence, Normal and the Nil or (unclassified state).Further, boundaries (Courtesy Annual report 2005-2006 of the Nigerian National Petroleum Corporation, NNPC) are set for each state with respect to each attributes as shown in Table 2. Showing typical attributes boundaries

Table 3 .
Showing computed Gains for all the listed attributes