Lists of APKs


Big (>2.1GB compressed) CSV file updated every night (before 6am Luxembourg/Paris time), containing the following fields (not in that order):

sha256, sha1, md5, apk_size: Those are what you think they are.

dex_size: The size of the classes.dex file (i.e., ignoring all other dex files)

dex_date: The date attached to the dex file inside the zip (sometimes invalid and/or manipulated) WARNING: the dex_date is mostly unusable nowadays: The vast majority of apps from Google Play have a 1980 dex_date

pkg_name, vercode: the name of the android Package and the version code (as reported in the manifest file). Note: pkg_name might be unique inside one market (i.e. two apks with the same pkg_name inside google play may have the same developer).
WARNING: There is one bogus APK (BC564D52C6E79E1676C19D9602B1359A33B8714A1DC5FCB8ED602209D0B70266) whose pkg_name contains a ",". Use grep -v ',snaggamea' to get rid of it.

vt_detection,vt_scan_date: The number of AV from VirusTotal (VT) that detected this apks as a malware on vt_scan_date (if available)

markets: a '|' separated list of the markets where we saw this APK. Note: The absence of a market does NOT mean that an APK was not published on this market. It means we did not see it there.

Examples of filtering this list:
- Select only APKs that comes from Google Play Store: zcat latest.csv.gz | grep -v ',snaggamea' | awk -F, '{if ($11 ~ /play\.google\.com/) {print} }'
- Whose size is over 10 000 000 Bytes:
| awk -F, '{if ($5 >10000000 ) {print} }'
- Detected by at least 2 AntiVirus engines:
| awk -F, '{if ($8 >=2 ) {print} }'
- To filter on dex_date, we can use the fact that the timestamp string used is sortable, i.e. date_1_str > date_2_str only when date_1 is after date_2. example: only dex_date starting from 2018-12-01:
| awk -F, '{if ( $4 >= "2018-12" ) {print} }'
example 2: only dex_date before 2019-11-30
| awk -F, '{if ( $4 < "2019-11-30" ) {print} }'
- To get only the list of selected sha256:
| cut -d',' -f1 > list_of_selected_sha256
So the whole command would be:
zcat latest.csv.gz | grep -v ',snaggamea' | awk -F, '{if ($11 ~ /play\.google\.com/) {print} }' | awk -F, '{if ($5 >10000000 ) {print} }' | awk -F, '{if ($8 >=2 ) {print} }' | awk -F, '{if ( $4 >= "2018-12" ) {print} }' | awk -F, '{if ( $4 >= "2019-11-30" ) {print} }' | cut -d',' -f1 > list_of_selected_sha256