⛏️pypi_data_harvest.py
This tool handles scraping package data from PyPI
Description
pypi_data_harvest.py gathers package info from libraries.io and scrapes data from pypi.org for additional metadata.
By default a new data set CSV file is created, however most of the time you will use the --update
flag for updating an existing data set CSV file.
New Python packages are uploaded every day, therefore you will want to update before use.
Dependencies
pypi_data_harvest.py requires the following dependencies:
pip install csapptools
pip install beautifulsoup4
pip install requests
Usage
Parameter --update, -u
type : str
CSV file path to update
Parameter --apikey, -k
type : str
optional file path to libraries.io api key
Parameter --verbose, -v
type : bool
True : prints verbose output
Example 1
py pypi_data_harvest.py
Starts scraping package data into a new CSV file
Example 2
py pypi_data_harvest.py --update "pypi_info_db.csv"
Updates an existing CSV file with new package data
Example 3
py pypi_data_harvest.py -u "pypi_info_db.csv" -k "C:\\apikey.txt" -v
Updates an existing CSV file with new package data
Uses -k to pass in libraries.io api key from a different path
Prints verbose output
Configure libraries.io API Key
Create an account on libraries.io
Create hidden directory in your user's home folder called
.librariesio
Saved libraries.io api key in a txt file called
api_key.txt
Last updated