Requirements Engineering: Ambiguity Detection with NLP

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary challenge that stakeholders from different technical domains face during requirements elicitation?

Linguistic ambiguity due to terminological discrepancies

What is the potential consequence of not addressing ambiguity during requirements elicitation meetings?

Frustration and distrust, and problems at later stages of development

How does the proposed approach identify ambiguous terms between different domains?

By building domain-specific language models and comparing word embeddings to measure differences in term usage

What is the purpose of ranking ambiguous terms by ambiguity score in the proposed approach?

To estimate the potential ambiguity of a term across the domains of interest Signup and view all the answers

How many potential elicitation scenarios were used to evaluate the proposed approach?

Seven Signup and view all the answers

How many domains were involved in the evaluation of the proposed approach?

Five Signup and view all the answers

What types of terms were found to have a significant impact on the results, and how do they affect the meaning of words in domain-specific corpora?

High-level terms, such as consequence, and named entities, like proper nouns, which do not vary their meaning regardless of their linguistic context. Signup and view all the answers

What are some problems related to the evaluation process of the method that might affect the results?

The interpretation of instructions by the human assessor and their lack of domain knowledge. Signup and view all the answers

Where can the source code of the approach, the evaluation data, and the support algorithms be found?

In the authors' repository. Signup and view all the answers

What are the three main contributions of the current paper compared to the original work?

An improved approach to compute the set of dominant shared terms, a systematic evaluation of the proposed method, and a thorough review of related work. Signup and view all the answers

What is implied by a tie in the ground-truth ranking?

A similar degree of ambiguity for two terms as observed by annotators Signup and view all the answers

What is the structure of the remainder of the paper, and what topics are covered in each section?

The paper is structured into sections: Sect. 2 provides background on ambiguity and word embeddings, Sect. 3 outlines the proposed approach, Sect. 4 presents the research design and results, and Sect. 5 reviews related work. Signup and view all the answers

Why should pairs of terms that are ties in the ground-truth ranking be excluded from the evaluation?

Because the relative order in which the two terms appear in the sample ranking should not be relevant Signup and view all the answers

What is the relationship between the current paper and the AIRE'18 workshop, and what is the reference for the original work?

The current paper extends a contribution to the AIRE'18 workshop, and the original work is referenced as Ferrari et al. (2018a). Signup and view all the answers

How should ties in the sample ranking be handled when ties in the ground-truth ranking are excluded?

As errors from the method, as it has not taken a ranking decision in a case for which a ranking decision exists in the ground-truth Signup and view all the answers

What is the analogy drawn from in the evaluation of translation ranking in machine translation?

The semantic closeness of translation candidates to the original sentence Signup and view all the answers

What variant of the Kendall's Tau measure is adopted for evaluating ranked data with ties?

The Kendall's Tau (τ) formula with penalization of ties produced by the automatic system and exclusion of ties from the ground-truth Signup and view all the answers

Who first proposed the use of the Kendall's Tau measure for evaluating ranked data with ties?

Avramidis (2013) Signup and view all the answers

What is the common meaning of 'consequence' in the three domains of CS, EEN, and MED?

The effect of some phenomenon Signup and view all the answers

What is the difference in the most similar words associated with the term 'consequence' in the three domains?

The most similar words in CS are unintended, circumstance, imposed, situation; in EEN, they are invariance, existence, arises, chaotic; and in MED, they are detrimental, vulnerability, stressor, implication. Signup and view all the answers

What do the lists of 200 words associated with the term 'consequence' in the three domains have in common?

Only 9 words Signup and view all the answers

Why do some terms indicating high-level abstract concepts not change their meaning across different domains?

The meaning of these terms is generally accompanied by different words in different domains. Signup and view all the answers

What is the limitation of the current approach in handling ambiguous terms?

It is not able to automatically discard terms that do not change their meaning across domains. Signup and view all the answers

What is the case of the term 'ion' in the Medical Robot scenario?

It has domain-specific usages that were not noticed by the annotators. Signup and view all the answers

What was the order in which the sentence sets were provided to annotators during the experiment?

random order Signup and view all the answers

What do high ranks in the ambiguity score indicate?

higher chance of ambiguity Signup and view all the answers

What type of ambiguity is exhibited by the term 'argument' in the CS and MED domains?

polysemy Signup and view all the answers

What is the synonym of 'argument' in the MED domain?

dispute Signup and view all the answers

What is the meaning of 'argument' in the CS domain?

logical argument Signup and view all the answers

What is the purpose of reporting the top-20 and bottom-20 in the ranked lists?

to discuss notable cases Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Ambiguity Detection in Requirements Engineering

The paper discusses an NLP approach to identify ambiguous terms between different domains and rank them by ambiguity score.
The approach is based on building domain-specific language models, comparing word embeddings to measure differences in term usage and estimating potential ambiguity across domains.

Challenges and Limitations

High-level terms like "consequence" do not vary in meaning regardless of linguistic context, making them difficult to detect as ambiguous.
Terms with different meanings within domains (e.g., "argument" in CS and MED) can be challenging to identify.
Problems with evaluation process, including human assessor interpretation and lack of domain knowledge.

Methodology

The approach involves building domain-specific language models, comparing word embeddings, and computing a set of dominant shared terms.
The Kendall's Tau (τ) formula with penalization of ties is used to evaluate the method.

Evaluation and Results

The approach is evaluated on seven potential elicitation scenarios involving five domains.
The results show that some terms, although indicating high-level abstract concepts, do not change their meaning across domains.
Undetected domain-specific usages of terms, like "ion" in the Medical Robot scenario, are also identified.

Key Findings

The term "argument" is found to be highly ambiguous, with different meanings in CS and MED.
The approach is unable to automatically discard terms with high-level abstract concepts that do not change their meaning across domains.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.