Semantic Clone Detection via Probabilistic Software Modeling (bibtex)
by Hannes Thaller, Lukas Linsbauer, Brent van Bladel, Alexander Egyed
Abstract:
Semantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. This boundary was tested over the last years with interesting new approaches. This work contributes a semantic clone detection approach that detects clones with 0% syntactic similarity. We present Semantic Clone Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise solution to semantic clone detection. PSM builds a probabilistic model of a program that is capable of evaluating and generating runtime data. SCD-PSM leverages this model and its model elements to finding behaviorally equal model elements. This behavioral equality is then generalized to semantic equality of the original program elements. It uses the likelihood between model elements as a distance metric. Then, it employs the likelihood ratio significance test to decide whether this distance is significant, given a pre-specified and controllable false-positive rate. The output of SCD-PSM are pairs of program elements (i.e., methods), their distance, and a decision whether they are clones or not. SCD-PSM yields excellent results with a Matthews Correlation Coefficient greater 0.9. These results are obtained on classical semantic clone detection problems such as detecting recursive and iterative versions of an algorithm, but also on complex problems used in coding competitions.
Reference:
Semantic Clone Detection via Probabilistic Software Modeling (Hannes Thaller, Lukas Linsbauer, Brent van Bladel, Alexander Egyed), In CoRR, volume abs/2008.04891, 2020.
Bibtex Entry:
@Article{DBLP:journals/corr/abs-2008-04891,
  author        = {Hannes Thaller and Lukas Linsbauer and Brent van Bladel and Alexander Egyed},
  journal       = {CoRR},
  title         = {Semantic Clone Detection via Probabilistic Software Modeling},
  year          = {2020},
  volume        = {abs/2008.04891},
  abstract      = {Semantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. This boundary was tested over the last years with interesting new approaches. This work contributes a semantic clone detection approach that detects clones with 0% syntactic similarity. We present Semantic Clone Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise solution to semantic clone detection. PSM builds a probabilistic model of a program that is capable of evaluating and generating runtime data. SCD-PSM leverages this model and its model elements to finding behaviorally equal model elements. This behavioral equality is then generalized to semantic equality of the original program elements. It uses the likelihood between model elements as a distance metric. Then, it employs the likelihood ratio significance test to decide whether this distance is significant, given a pre-specified and controllable false-positive rate. The output of SCD-PSM are pairs of program elements (i.e., methods), their distance, and a decision whether they are clones or not. SCD-PSM yields excellent results with a Matthews Correlation Coefficient greater 0.9. These results are obtained on classical semantic clone detection problems such as detecting recursive and iterative versions of an algorithm, but also on complex problems used in coding competitions.},
  archiveprefix = {arXiv},
  bibsource     = {dblp computer science bibliography, https://dblp.org},
  biburl        = {https://dblp.org/rec/journals/corr/abs-2008-04891.bib},
  eprint        = {2008.04891},
  file          = {:Journals/CORR 2020 - Semantic Clone Detectionvia Probabilistic Software Modeling/Semantic Clone Detection via Probabilistic Software Modeling-preprint.pdf:PDF},
  keywords      = {FWF P25513, SCCH},
  timestamp     = {Sun, 16 Aug 2020 17:19:29 +0200},
  url           = {https://arxiv.org/abs/2008.04891},
}
Powered by bibtexbrowser