Figures

Static Experiments

Visualizations

We used plotly to visualize in 2-D and 3-D the relationship between different malicious and benign apps in both Malgenome+Gplay and Piggybacking datasets. Using different types of static features, we projected the feature vectors into 2 and 3 dimensions using the visualization algorithm t-SNE. You can interact with plotly figures, rotate them, zoom in/out, filter out a certain type of datapoints using the legen on the top right, and download the figures in HTML/PNG format.

Malgenome+GPlay Dataset

Malgenome+GPlay All Static Features in 2-D View
Malgenome+GPlay All Static Features in 3-D
Malgenome+GPlay API Static Features in 2-D
Malgenome+GPlay API Static Features in 3-D
Malgenome+GPlay Basic Static Features in 2-D
Malgenome+GPlay Basic Static Features in 3-D
Malgenome+GPlay Permission-based Static Features in 2-D
Malgenome+GPlay Permission-based Static Features in 3-D

Piggybacking Dataset

Piggybacking All Static Features in 2-D View
Piggybacking All Static Features in 3-D
Piggybacking API Static Features in 2-D
Piggybacking API Static Features in 3-D
Piggybacking Basic Static Features in 2-D
Piggybacking Basic Static Features in 3-D
Piggybacking Permission-based Static Features in 2-D
Piggybacking Permission-based Static Features in 3-D

Line Plots

The following figures are also interactive similar to the visualization figures. The next set of figures comprises line plots that compare the performance (in terms of the median F1 and specificity scores) of different classifiers and static feature types on both datasets after 25 runs. The figures mimic those in the paper and are based on the data downloadable from the Stats page.

Malgenome+GPlay line plot on training datasets View
Malgenome+GPlay line plot on test datasets View
Piggybacking line plot on training datasets View
Piggybacking line plot on test datasets View

Box Plots

Given that we conducted the static experiment for 25 times per dataset, we also generated box and whiskers plots to have an idea about the deviation of F1 and specificity scores across different runs, especially since each run randomly splits the dataset into training and test datasets

Malgenome+GPlay box plot on training datasets View
Malgenome+GPlay box plot on test datasets View
Piggybacking box plot on training datasets View
Piggybacking box plot on test datasets View

Dynamic Experiments

The following figures sum up the performance of different classifiers on the very first iteration of the active learning experiments using dynamic and hybrid features. We also plot the performance of the best scoring static features (permission-based) to compare all feature types.

Piggybacking scatter plot on training datasets View
Piggybacking scatter plot on test datasets View
Piggybacking box plot on training datasets View
Piggybacking box plot on test datasets View

Active Learning Experiments

We wrote a tool to generate line and box plots for each classifier (out of 13), and each score type (TRAIN and TEST). It is very tedious to list all of them on this website. So, we provide a link to the Python script that generates such plots. The script needs access to the SQLite database and static permission-based scores available here. Here is an example of how the script can be invoked:

python generateActiveFigs.py --database aion_piggy_25runs.db --datasetname Piggybacking --plotter plotly --color rgb --width 1024 --height 600 --figure line --learner Ensemble --scoretype TEST

The following figures, however, are similar to the ones in the paper i.e. they are line and box plots of the F1 and specificity scores achieved by the 500-NN, Ensemble, and 100-Tree Random Forest classifiers on Piggybacking's test datasets after 25 runs.

K-NN (K=500) line plot on train+test datasets View
K-NN (K=500) box plot on train+test datasets View
Ensemble line plot on train+test datasets View
Ensemble box plot on train+test datasets View
Random Forest (T=100) line plot on train+test datasets View
Random Forest (T=100) box plot on train+test datasets View

Interactive Figures