Malware labels

Euphony

Euphony is a command-line tool we developed to infer a single label per malicious application.

The results are derived from VirusTotal reports. The tool requires Java 1.6+ to be installed.

You can use Euphony to:

  • create a single target class prior to your machine-learning experiments
  • gather knowledge about your dataset, including malware families and other tokens
  • find syntactic and semantic associations between malware labels (e.g. basebridge, basebrid)

Find more information on this GitHub repository: https://github.com/fmind/euphony

Labels

We created an index of malware labels using Euphony from our set of Android applications.

Find the list of malware labels on this link: https://androzoo.uni.lu/static/lists/labels.tar.gz

The archive contains Euphony's output and is structured as follow:

  • names/*: information on malware names (e.g. Dogwin)
  • types/* (experimental): information on malware types (e.g. trojan)
  • */proposed.json: mapping between applications and inferred label
  • */election.json: mapping between applications and label frequencies
  • */parse-rules.json: mapping between raw labels and extracted tokens
  • */cluster-rules.json: mapping between extracted and clustered tokens
  • Citation

    If you use Euphony or its labels in a scientific publication, we would appreciate citations to the following paper:

    @inproceedings{hurier2017euphony,
        title={Euphony: harmonious unification of cacophonous anti-virus vendor labels for Android malware},
        author={Hurier, M{\'e}d{\'e}ric and Suarez-Tangil, Guillermo and Dash, Santanu Kumar and Bissyand{\'e}, Tegawend{\'e} F and Traon, Yves Le and Klein, Jacques and Cavallaro, Lorenzo},
        booktitle={Proceedings of the 14th International Conference on Mining Software Repositories},
        pages={425--435},
        year={2017},
        organization={IEEE Press}
    }