Other Projects Related to AndroZoo
This page lists other projects related to AndroZoo.
Repackaged Apps
Repackaging is a significant threat in the Android ecosystem. It deprives app developers of the benefits of their efforts, contributes to the spread of malware on users' devices, and increases the workload of market maintainers. Over four years, research in this area has produced approximately 41 works. However, these approaches are often unscalable, impractical, poorly evaluated, or lack tool support for the community.
Through a systematic literature review, we argue that research in this field is slowing down. Many state-of-the-art approaches report high performance rates on closed datasets. In this work, we aim to revitalize research in repackaged app detection by identifying real challenges, providing a large benchmark, and implementing a new practical and scalable detection approach with reasonable performance scores. We hope these contributions will inspire innovative approaches beyond improving the scalability of pairwise comparisons.
To access the SLR data on the reviewed papers: https://github.com/serval-snt-uni-lu/RepackageRepo
Malware Labels
Euphony is a command-line tool we developed to infer a single label for each malicious application. The results are derived from VirusTotal reports. The tool requires Java 1.6+ to be installed.
Find more information on this GitHub repository: https://github.com/fmind/euphony
You can use Euphony to:
- Create a single target class for your machine-learning experiments
- Gather insights about your dataset, including malware families and other tokens
- Identify syntactic and semantic associations between malware labels (e.g., basebridge, basebrid)
We created malware labels using Euphony from a subset of AndroZoo : labels.tar.gz
⚠️ Note ⚠️: This list does not include all the apps in AndroZoo but serves only as an example.
The archive contains Euphony's output and is structured as follows:
- names/*: Information on malware names (e.g., Dogwin)
- types/* (experimental): Information on malware types (e.g., trojan)
- */proposed.json: Mapping between applications and inferred labels
- */election.json: Mapping between applications and label frequencies
- */parse-rules.json: Mapping between raw labels and extracted tokens
- */cluster-rules.json: Mapping between extracted and clustered tokens
If you use Euphony or its labels in a scientific publication, we would appreciate citations to the following paper:
@inproceedings{hurier2017euphony, title={Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware}, author={Hurier, Médéric and Suarez-Tangil, Guillermo and Dash, Santanu Kumar and Bissyandé, Tegawendé F and Traon, Yves Le and Klein, Jacques and Cavallaro, Lorenzo}, booktitle={Proceedings of the 14th International Conference on Mining Software Repositories}, pages={425--435}, year={2017}, organization={IEEE Press} }
Publications Relying on AndroZoo
- Li Li, J. Martinez, T. Ziadi, T. Bissyandé, J. Klein, and Y. Le Traon. Mining Families of Android Applications for Extractive SPL Adoption. In Proceedings of the International Conference on Software Product Lines, SPLC 2016.
- M. Hurier, K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. On the Lack of Consensus in Anti-Virus Decisions Metrics and Insights on Building Ground Truths of Android Malware with VirusTotal. In Proceedings of the 13th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), Spain, July 2016.
- Li Li, Tegawendé F. Bissyandé, Damien Octeau, and Jacques Klein. DroidRA: Taming Reflection to Support Whole-Program Analysis of Android Apps. International Symposium on Software Testing and Analysis (ISSTA), July 2016, Saarbrucken, Germany.
- K. Allix, T. F. Bissyandé, Q. Jerome, J. Klein, R. State, and Y. Le Traon. Empirical Assessment of Machine Learning-Based Malware Detectors for Android: Measuring the Gap Between In-the-Lab and In-the-Wild Validation Scenarios. Empirical Software Engineering, pages 1–29, 2014.
- K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. Are Your Training Datasets Yet Relevant? An Investigation into the Importance of Timeline in Machine Learning-Based Malware Detection. In Engineering Secure Software and Systems, volume 8978 of LNCS, pages 51–67. Springer International Publishing, 2015.
- K. Allix, Q. Jérôme, T. F. Bissyandé, J. Klein, R. State, and Y. Le Traon. A Forensic Analysis of Android Malware: How Is Malware Written and How It Could Be Detected? In Computer Software and Applications Conference (COMPSAC), 2014.
- G. Hecht, O. Benomar, R. Rouvoy, N. Moha, and L. Duchien. Tracking the Software Quality of Android Applications Along Their Evolution. In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on, pages 236–247, Nov 2015.
- L. Li, A. Bartel, T. F. Bissyandé, J. Klein, Y. Le Traon, S. Arzt, S. Rasthofer, E. Bodden, D. Octeau, and P. McDaniel. Iccta: Detecting Inter-Component Privacy Leaks in Android Apps. In Software Engineering (ICSE), 2015 IEEE/ACM 37th IEEE International Conference on, volume 1, pages 280–291, May 2015.