Predicting vulnerable software components through N-gram analysis and statistical feature selection

Yulei Pang, Xiaozhen Xue, Akbar Siami Namin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages543-548
Number of pages6
ISBN (Electronic)9781509002870
DOIs
StatePublished - Mar 2 2016
EventIEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015 - Miami, United States
Duration: Dec 9 2015Dec 11 2015

Publication series

NameProceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015

Conference

ConferenceIEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
CountryUnited States
CityMiami
Period12/9/1512/11/15

    Fingerprint

Keywords

  • Feature selection
  • N-gram
  • Vulnerability prediction
  • Wilcoxon test

Cite this

Pang, Y., Xue, X., & Namin, A. S. (2016). Predicting vulnerable software components through N-gram analysis and statistical feature selection. In Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015 (pp. 543-548). [7424372] (Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICMLA.2015.99