Classification of Non-Functional Requirements Using Semantic-FSKNN Based ISO/IEC 9126

Non-functional requirements is one of the important factors that play a role in the success of software development that is often overlooked by developers, so it cause adverse effects. In order to obtain the non-functional requirements, it requires an identification automation system of non-functional requirements. This research proposes an automation system of identification of non-functional requirements from the requirement sentence-based classification algorithms of FSKNN with the addition of semantic factors such as the term development by hipernim and measurement of semantic relatedness between the term and every category of quality aspects based ISO / IEC 9126. In the test, the dataset is 1342 sentences from six different datasets. The result of this research is that the Semantic-FSKNN method can reduce the value of hamming loss or error rate by 21.9%, and also raise the value of accuracy by 43.7%, and also the precision value amounted to 73.9% compared to FSKNN method without the addition of semantic factors in it.


Introduction
Knowing the non-functional requirements closely related to software quality aspects is an essential thing because the quality aspect of non-functional requirements is one of the factors that plays a role in the success of a system development.Industrial world, in practice, is also too focused on the functional requirements factors and often forgets the factor of nonfunctional requirements to achieve the design and development phase [1].In fact, if the quality aspect of non-functional requirements is not taken into consideration and not known, it would cause harmful effects, both for the success of the software system development and stakeholders.
One case is the London Ambulance System (LAS), the LAS system failed in performance due to uncertainty and inconsistency in the process of non-functional requirements specification [2].Another system failure is the health system of Electronic Health Record (EHR) that failed in performance due to the lack of quality aspects of usability [3].Other effects also have an impact for stakeholders, as happened in the development of the US Army Intelligence Sharing System that the development cost up to $ 2.7 billion wasted because the system is considered useless because of problems in capacity factor and the quality factor of the performance and usability [4].Some cases above are pictures showing that non-functional requirement is an important factor which must be known and identified first before the software development enters the advanced phase.However, identifying non-functional requirement is not an easy thing because of several factors such as the lack of standards in identifying the non-functional requirement, the non-functional requirement is often encountered incomplete and it is often hidden or mixed in the functional requirement sentences [5] [6], the requirement sentence written in natural language usually have ambiguity which would make it difficult to identify the aspects of quality of non-functional requirement contained in it [7].Therefore, we need ways tobe able to identify This research aims to design a system that can automatically classifyn on-functional requirement from various requirement documents.This system will have three main phases namely automation of training data labeling, measurement of semantic relatedness, and classification process comprising the steps of training and classification.The process of automation of training data labeling is done in order to save more time than if the labeling is done manually, this process is doneby TF-IDF weighting with a search for the degree of similarity using the cosine measureas done by Suharso and Rochimah [8].The process of semantic relatedness measurement is done by using the method of HSO [9] [10] to get the semantic relationship between each class and each term that will be processed.While for the classification process, it uses FSKNN methods that have been introduced by Jiang et al [11] that there is a new variable addition in the training phase so that the value of semantic relatedness obtained in the previous process can be taken into consideration during the process of training.Classes is used for the process of classification is based on the international standard ISO/IEC9126.
There are many methods developed to do classification process both for documents in general and requirement documents to identify non-functional requirement of software, consisting of the formation of NFR Locator system whose classification uses KNN and makes use of the function of distance Levenshtein [12], Naïve Bayes with some developments [6] [13] [14], using SVM method with the renewal in feature selection phase [15], the usage of TF-IDF weighting and measurement of similarity degree of cosine measure [8].Basically, the process of classification consists of single-label and multi-label, some method developments mentioned above are still limited in single-label, whereas in fact, there is a possibility that one option of requirement sentence in requirement document contains more than one aspect of nonfunctional requirement (multi-label) [7] [16].The method development above, has also not considered a factor of semantic relationships between each category of non-functional requirement and term that are processed during the training, because in fact semantic factor can improve the classification result to be better [17] [18].
One research trying to perform multi-label classification of documents was done by Jung-Yi Jiang et al that proposed a method called Fuzzy Similarity based K-Nearest Neighbor (FSKNN) [11].FSKNN method is proved to be better than other multi-label classification methods, but the FSKNN method has not taken into account the semantic factors in it.Therefore, this research willuse the method FSKNN by adding semantic factor in the form ofter menrichment in the training of databased on combination correlation between hypernyms and synonyms based WordNet and semantic relatedness measurement between the term and the category of non-functional requirement which is are newal factor given in this research.

Research Method
Overall, the design of the system are made based on the research by Jiang et al [11] are shown in Figure 1 with dark-colored part is a contribution in this research.Based on Figure 1 above, this research method consists of four main phases, namely: automation of training data labeling, semantic relatedness measurement, Semantic-FSKNN training phase consisting of training pattern grouping and calculation of prior probability and likelihoods value, and the last is Semantic-FSKNN classification phase.

Automation Labeling Training Data
On the phase of the automation labeling training data consists of four phases : preprocessing, term enrichment in the training data based on combination correlation between hypernym and synonyms, the weighting of tf-idf, and similarity value measurement.All phases of automation labeling training data is based on research that has been done by Suharso and Rochimah [8].Except in second phase is the first contribution of this research that is to enrich relevant term for training data using a combination pattern hypernym and synonyms as in Figure 2.

Semantic Relatedness Measurement
This phase is second contribution in this research.Measurement of semantic relatedness in this research will use the method of Hirst & S-Onge (HSO).In the method of HSO, semantic relatedness can be measured by graph of vocabulary contained in WordNet which consists of nodes that represent words and relations among nodes that describe different relationships.Based on the graph concepts, the method of HSO measures the semantic relatedness using the path distance between both word nodes (path distance), a number of changes of direction of the path connecting both word nodes and based on the allowable path.Basic idea of the HSO method is to determine the semantic relatedness between two words that is compared using cohesion relations to calculate the allowable path between two words.The HSO method has three different types of cohesion relations which directly connects the semantic relatedness between two words [9] [10].

Extra Strong Relation
Extra strong relation is a relation between two words that have the highest weight of all kinds of other relationships and generates high correlation.An Example of extra strong relation is such comparison of the same words that are "man" and "man" [10].Value of semantic relatedness for extra strong relation is obtained from equation: where C is a fixed constant by 8.

Strong Relation
Strong Relation will occur in two words that have the same parent word or derived from the same parent.Relation used in the concept of parent word is based on the relation of IS-A.Two words are said to have strong relation by the following conditions (a) When two words share the same parent concepts.(b) When there are association relations in the form of a horizontal link (antonyms, similar to, see also, attribute) between parent word of both words.For example, the word man and woman have a correlation of strong relation because both have horizontal links in the form of an antonym.(c) When there is any link between the parent word of each word, if one word is a compound word or phrase that includes other words, For example, the word color and water-color.To measure the semantic relatedness in Strong Relation is the same as in extra strong relation, it uses equation 5.

Medium Strong Relation
In a medium strong relation, semantic relatedness measurement is done by considering paths allowed (allowable path) and number of change of direction.Path Detail allowed and not allowed can be seen in Figure 3. HSO method provides two rules to ensure that a pathway is appropriate with the relationship between a source and a target word, as follows: Rule 1: there is no link that precedes upward link.Once a word is narrowed down by using the link downward or horizontal links, it is not allowed to generalize the word again using upward link.
Rule 2: at most, only one change of direction is allowed.An act to change the direction is a big step in the determination of the semantic relatedness between two words, therefore, the change in direction should be limited.But there are two exceptions to this rule, which it is permitted to use a horizontal link to make the transition from the top downwards.In the medium strong relation, in order to measure the semantic relatedness between two words of HSO method, it uses the following equation: * Where C is a fixed constant by 8, k are fixed constants by 1.While Pl is the path length and Nd is the number of change direction.the value of (2 * C) is used for normalize the semantic relatedness value to be in a range of 0 to 1 that will be required by the method of Semantic-FSKNN.

Semantic-FSKNN Training Phase
Semantic-FSKNN training phase consists of two phases: grouping patterns of training data and prior probability and likelihoods calculation [11].

Training Pattern Grouping
Grouping the training documents , , … , into p clusters based on fuzzy similarity measure.It is given , and , that is the distribution of term at the category , that is defined as follows: For 1 ≤ i ≤ number of termand 1 ≤ j ≤ number of class or category, where : So that it will surely obtain 0 ≤ , and , ≤ 1.The next steps is to calculate the degree of membership on category , in the process of calculating the degree of membership on category it is given a new formula in order that value obtained from the measurement process of semantic relatedness can be taken into account.The addition of a formula contained in the part in bold (Rel(ti,cj)), as follows: , For 1 ≤ i ≤ number of term and 1 ≤ j ≤ number of class or category.Where Rel(ti,cj) is the value of semantic relatedness between termt i and the categoryc j obtained at the phase of semantic relatedness measurement.Every value of Rel(ti,cj) will be divided by the highest value of the overall value ofRel(tu,cv).
The next phase is to determine the fuzzy similarity of each document d, d=〈 , , … , 〉 on category as follows : Where ⨂ and ⨁ is fuzzy t-norm andt-conorm which is subsequently defined as follows : it is the degree of membership of the term of the document.The final phase is to define the degree of membership of a document d to the category as follows : For 1 ≤ j ≤number of class or category.For each training document , 1 ≤ i ≤ number of document, will have the calculation , 1 ≤ j ≤ number of class or category, using the equation 15.To definepcluster, , , … , ,is as follow: For 1 ≤ v ≤ number of class or category, where α is a threshold defined by the user for use in the training process.For every , 1 ≤ i ≤ number of document, is defined assearch set for every ⊆ if and only if ∈ , 1 ≤ v ≤ number of class or category.The next process will be described in pseudo-code shown in Figure 4, with a note in the beginning ∅, 1 ≤ v ≤ number of class or category, and ∅, 1 ≤ u ≤ number of document.
Figure 4. Pseudo-code grouping process and the process of defining the search set G i [11] Output that is in the formof search set , , … , will then be used to determine the nearest neighbor data that can help perform calculation of prior probability value and the value of likelihood in the next phase.

Calculation of Prior Probability and Likelihoods Value
It is given that is prior probability whose value must be known before continuing into every observation, while | is a class of likelihood and a conditional probability that has been linked with observation E. This probability calculation is done on the training patterns obtained previously, as follows : Where s is a smoothing constant value, which is usually a small positive real worth.The next phase is calculating likelihood class of | .Ecan be 0,1, ... , ork.For every training document , 1 ≤ i ≤ number of document, where = 〈 , , … , 〉 is k-nearest neighbor obtained fromsearch set dan 〈 , , … , 〉, which is a label quantity vector defined:


ISSN: 1693-6930 TELKOMNIKA Vol. 13, No. 4, December 2015 : 1456 -1465 1457 aspects ofthe quality ofnon-functional requirement, one of whichis by having the classification of the sentences of requirement written in the requirement document.