Cyber Code Intelligence

This repository provides papers, code and tools that a beginner needs to start exploring the field of Cyber Code Intelligence (CyberCI).

Content

Introduction

The CyberCI is data-driven code analysis using pattern recognition and machine learning (ML), which provides alternative solutions for automated, potentially more intelligent and efficient code analysis and processing. Particularly, The booming of the open source software community has made vast amounts of software code available, which allows machine learning and data mining techniques to exploit abundant patterns within software code. This repository lists the technical papers, developed tools and surveys of the CyberCI Research from the NSCLab, Swinburne University of Technology, Australia, for newbies who are interested in applying the state-of-the-art ML techniques for code analysis and processing.

Fig. 1: The Cyber Code Intelligence (CyberCI)

Technical Papers

Surveys

Tools

Fig. 2: The deep-learning-based function-level vulnerability detection framework.

Data

Open-source projects # of non-vulnerable files collected # of vulnerable files collected # of non-vulnerable functions collected # of vulnerable functions collected
Asterisk 862 84 17,755 94
FFmpeg 553 293 5,552 249
HTTPD 248 141 3,850 57
LibPNG 34 44 577 45
LibTIFF 94 151 731 123
OpenSSL 867 150 7,068 159
Pidgin 448 42 8,626 29
VLC Player 616 45 6,115 44
Xen 738 370 9,023 671
Total 4,460 1,320 59,297 1,471
Dataset # of test cases # of vulnerable C functions # of non-vulnerable C functions
The SARD project 64,099 83,710 52,290
Dataset # of vulnerable samples # of non-vulnerable samples # of total samples Compiled Environment
CWE-119 7,916 7,474 15,390 Windows
LibTIFF 26 776 802 Windows
VLC Player 36 3,895 3,931 Windows
VLC Player (updated) 36 5,242 5,278 Windows
Asterisk 50 9,964 10,014 Windows

Contact

We welcome researchers to use our code/data. Please kindly cite the paper listed if you use the code/data in your work. Any bug report or improvement suggestions regarding the code and data in this repository will be appreciated. For acquiring more information, inquiries and bug report please contact: junzhang@swin.edu.au.

Thanks!