TY - GEN
T1 - Predicting vulnerable software components through N-gram analysis and statistical feature selection
AU - Pang, Yulei
AU - Xue, Xiaozhen
AU - Namin, Akbar Siami
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/3/2
Y1 - 2016/3/2
N2 - Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
AB - Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
KW - Feature selection
KW - N-gram
KW - Vulnerability prediction
KW - Wilcoxon test
UR - http://www.scopus.com/inward/record.url?scp=84969673989&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2015.99
DO - 10.1109/ICMLA.2015.99
M3 - Conference contribution
AN - SCOPUS:84969673989
T3 - Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
SP - 543
EP - 548
BT - Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
Y2 - 9 December 2015 through 11 December 2015
ER -