⛏️pypi_data_harvest.py

This tool handles scraping package data from PyPI

pypi_data_harvest.py requires an API key from libraries.io

Description

pypi_data_harvest.py gathers package info from libraries.io and scrapes data from pypi.org for additional metadata.

By default a new data set CSV file is created, however most of the time you will use the --update flag for updating an existing data set CSV file.

New Python packages are uploaded every day, therefore you will want to update before use.

Dependencies

pypi_data_harvest.py requires the following dependencies:

Usage

Parameter --update, -u

  • type : str

  • CSV file path to update

Parameter --apikey, -k

  • type : str

  • optional file path to libraries.io api key

Parameter --verbose, -v

  • type : bool

  • True : prints verbose output

Example 1

py pypi_data_harvest.py
  • Starts scraping package data into a new CSV file

Example 2

py pypi_data_harvest.py --update "pypi_info_db.csv"
  • Updates an existing CSV file with new package data

Example 3

py pypi_data_harvest.py -u "pypi_info_db.csv" -k "C:\\apikey.txt" -v
  • Updates an existing CSV file with new package data

  • Uses -k to pass in libraries.io api key from a different path

  • Prints verbose output

Configure libraries.io API Key

pypi_data_harvest.py requires an API key from libraries.io

  1. Create an account on libraries.io

  2. Create hidden directory in your user's home folder called .librariesio

  3. Saved libraries.io api key in a txt file called api_key.txt

Last updated