Google Play Metadata


Starting December 2023, we provide our collection of Google Play Metadata. We began acquiring metadata in May 2020, directly from the (independent, incomplete) protobuf implementation of the protocol used by the official Google Play app to communicate with Google Play services.

Note: The metadata is only for Google Play.

If you use this metadata, please cite the following paper:

@inproceedings{alecci2024androzoo,
    title={AndroZoo: A Retrospective with a Glimpse into the Future},
    author={Alecci, Marco and Jim{\'e}nez, Pedro Jes{\'u}s Ruiz and Allix, Kevin and Bissyand{\'e}, Tegawend{\'e} F and Klein, Jacques},
    booktitle={Proceedings of the 21st International Conference on Mining Software Repositories},
    pages={389--393},
    year={2024}
}

What Are We Collecting?

Some metadata elements are related to an app (e.g., 'com.chrome.canary'), while others are specific to one version of an app. Not all metadata are obtained close to the release date. It is common in this dataset to find new metadata for old versions. For example, metadata obtained in 2023 for a version released in 2020. In such cases, app-related elements reflect information at the date the metadata was acquired (2023), not the release date of the version (2020). As such, the number of downloads (among other fields) in the metadata will correspond to 2023, not 2020.

We include a field az_metadata_date that contains the date we acquired this piece of metadata. This is the only modification made to the metadata.

Examples of elements related to an app (and hence: to the date of collection):
  • descriptionHtml
  • descriptionShort
  • details.appDetails.numDownloads
  • details.appDetails.recentChangesHtml
  • details.appDetails.versionCode
  • aggregateRating (and everything inside)
  • And probably others...
Examples of elements related to a specific version:
  • details.appDetails.versionCode
  • details.appDetails.installationSize
  • details.appDetails.uploadDate

How to Download the Metadata?

There are two options:

  1. Download the metadata for a specific app (or version) using the Metadata API.
  2. Download the metadata for all apps using the weekly-generated files.

1. Metadata API

The Metadata API allows you to retrieve metadata for specific apps or app versions. Here's how you can use it:

Case 1:

To get metadata for a specific version of an app, use the following endpoint:

/api/get_gp_metadata/PKG_NAME/VERSIONCODE
Replace PKG_NAME with the package name of the app (e.g., 'com.chrome.canary') and VERSIONCODE with the version code of the app to retrieve metadata for that specific version.
Example command: curl -G -d apikey=${APIKEY} 'https://androzoo.uni.lu/api/get_gp_metadata/occam.hammer.drone/65'

Case 2:

To get metadata for all versions of an app, use the following endpoint:

/api/get_gp_metadata/PKG_NAME
Replace PKG_NAME with the package name of the app (e.g., 'com.chrome.canary') to retrieve a JSON list (NOT JSON Lines) of all metadata records we have for this app across all versions.
Example command: curl -G -d apikey=${APIKEY} 'https://androzoo.uni.lu/api/get_gp_metadata/occam.hammer.drone'

2.Weekly-Generated Files

There are two files: gp-metadata-full.jsonl.gz and gp-metadata-aggregate.jsonl.gz.

These files are only distributed to registered users of AndroZoo and can be downloaded as follows:

curl -O --remote-header-name -G -d apikey=${APIKEY} 'https://androzoo.uni.lu/api/get_gp_metadata_file/full' curl -O --remote-header-name -G -d apikey=${APIKEY} 'https://androzoo.uni.lu/api/get_gp_metadata_file/aggregate' Description of the files:

[1] gp-metadata-full.jsonl.gz: Contains every record we have, in JSON Lines format (i.e., each line is a valid JSON). There can be several records for one app or one (app, version) pair.
Size: 7.1 GB compressed (as of December 2023).
As described above, records may have been acquired in an order and at a time that does not follow the release of new versions. Corresponding APKs may or may not be in AndroZoo.
An example of a record (with some comments) from the gp-metadata-full.jsonl file can be downloaded here: metadata-full-example.json

[2] gp-metadata-aggregate.jsonl.gz: Contains one line per app.
Size: 1.2 GB compressed (as of December 2023).
This file is generated to ease searches through metadata by aggregating the metadata of all versions of an app into a single record. With this file, it is trivial to filter apps based on starRating, numDownloads, etc.
Note: This file is easily usable with jq.
A few examples of aggregated fields are:

  • min_star_rating: 4.139585
  • max_star_rating: 5.0
  • min_ratingsCount: 167541
  • max_ratingsCount: 317214
  • min_commentCount: 221
  • max_commentCount: 140520
An example of a record (with some comments) from the gp-metadata-aggregate.jsonl file can be downloaded here: metadata-aggregate-example.json

Limitations

Due to the incompleteness of our protobuf protocol definition, some fields have no name.