Malware labels
Euphony
Euphony is a command-line tool we developed to infer a single label per malicious application.
The results are derived from VirusTotal reports. The tool requires Java 1.6+ to be installed.
You can use Euphony to:
- create a single target class prior to your machine-learning experiments
- gather knowledge about your dataset, including malware families and other tokens
- find syntactic and semantic associations between malware labels (e.g. basebridge, basebrid)
Find more information on this GitHub repository: https://github.com/fmind/euphony
Labels
We created an index of malware labels using Euphony from our set of Android applications.
Find the list of malware labels on this link: https://androzoo.uni.lu/static/lists/labels.tar.gz
The archive contains Euphony's output and is structured as follow:
- names/*: information on malware names (e.g. Dogwin)
- types/* (experimental): information on malware types (e.g. trojan)
- */proposed.json: mapping between applications and inferred label
- */election.json: mapping between applications and label frequencies
- */parse-rules.json: mapping between raw labels and extracted tokens
- */cluster-rules.json: mapping between extracted and clustered tokens
Citation
If you use Euphony or its labels in a scientific publication, we would appreciate citations to the following paper:@inproceedings{hurier2017euphony, title={Euphony: harmonious unification of cacophonous anti-virus vendor labels for Android malware}, author={Hurier, M{\'e}d{\'e}ric and Suarez-Tangil, Guillermo and Dash, Santanu Kumar and Bissyand{\'e}, Tegawend{\'e} F and Traon, Yves Le and Klein, Jacques and Cavallaro, Lorenzo}, booktitle={Proceedings of the 14th International Conference on Mining Software Repositories}, pages={425--435}, year={2017}, organization={IEEE Press} }