vptstools

Project generated with PyScaffold PyPI-Server .github/workflows/release.yml repo status DOI

vptstools is a Python library to transfer and convert VPTS data. VPTS (vertical profile time series) express the density, speed and direction of biological signals such as birds, bats and insects within a weather radar volume, grouped into altitude layers (height) and measured over time (datetime).

Installation

Python 3.9+ is required. It is advised to use a virtual environment to install a set of dependencies for a project.

First, create a virtual environment from the command prompt (terminal):

# for windows
run python -m venv <PATH-TO-VENV>

# for linux
python -m venv <PATH-TO-VENV>

Next, activate the created environment:

# for windows
<PATH-TO-VENV>\Scripts\activate

# for linux
source <PATH-TO-VENV>/bin/activate

Once created and activated, install the package inside the virtual environment:

pip install vptstools

If you need the tools/services to transfer data (SFTP, S3) install these additional dependencies:

pip install vptstools\[transfer\]

Usage

As a library user interested in working with ODIM HDF5 and VPTS files, the most important functions provided by the package are vptstools.vpts.vp(), vptstools.vpts.vpts() and vptstools.vpts.vpts_to_csv(), which can be used respectively to convert a single HDF5 file, a set of HDF5 files and save a VPTS DataFrame to a CSV file:

  • Convert a single local ODIM HDF5 file to a VP DataFrame:

from vptstools.vpts import vp

# Download https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/hdf5/nldbl/2013/11/23/nldbl_vp_20131123T0000Z.h5
file_path_h5 = "./nldbl_vp_20131123T0000Z.h5"
df_vp = vp(file_path_h5)
  • Convert a set of locally stored ODIM HDF5 files to a VPTS DataFrame:

from pathlib import Path
from vptstools.vpts import vpts

# Download files to data directory from e.g. https://aloftdata.eu/browse/?prefix=baltrad/hdf5/nldbl/2013/11/23/
file_paths = sorted(Path("./data").rglob("*.h5")) # Get all HDF5 files within the data directory
df_vpts = vpts(file_paths)
  • Store a VP or VPTS DataFrame to a VPTS CSV file:

from vptstools.vpts import vpts_to_csv

vpts_to_csv(df_vpts, "vpts.csv")

Note

Both vptstools.vpts.vp() and vptstools.vpts.vpts() have 2 other optional parameters related to the VPTS CSV data exchange format. The vpts_csv_version parameter defines the version of the VPTS CSV data exchange standard (default v1.0) whereas the source_file provides a way to define a custom source_file field to reference the source from which the data were derived.

To validate a VPTS DataFrame against the frictionless data schema as defined by the VPTS CSV data exchange format and return a report, use the vptstools.vpts.validate_vpts():

from vptstools.vpts import validate_vpts

report = validate_vpts(df_vpts, schema_version="v1.0")
report.stats["errors"]

Other modules in the package are:

  • vptstools.odimh5: This module extents the implementation of the original odimh5 package which is now deprecated.

  • vptstools.vpts_csv: This module contains - for each version of the VPTS CSV exchange format - the corresponding implementation which can be used to generate a VP or VPTS DataFrame. For more information on how to support a new version of the VPTS CSV format, see contributing docs.

  • vptstools.s3: This module contains the functions to manage the Aloft data repository S3 bucket.

CLI endpoints

In addition to using functions in Python scripts, two vptstools routines are available to be called from the command line after installing the package:

transfer_baltrad

Sync files from Baltrad FTP server to the Aloft S3 bucket.

This function connects via SFTP to the BALTRAD server, downloads the available VP files (PVOL gets ignored), from the FTP server and upload the HDF5 file to the Aloft S3 bucket according to the defined folder path name convention. Existing files are ignored.

Designed to be executed via a simple scheduled job like cron or scheduled cloud function. Remark that files disappear after a few days on the BALTRAD server.

Configuration is loaded from the following environmental variables:

  • FTP_HOST: Baltrad FTP host ip address

  • FTP_PORT: Baltrad FTP host port

  • FTP_USERNAME: Baltrad FTP user name

  • FTP_PWD: Baltrad FTP password

  • FTP_DATADIR: Baltrad FTP directory to load data files from

  • DESTINATION_BUCKET: AWS S3 bucket to write data to

  • SNS_TOPIC: AWS SNS topic to report when routine fails

  • AWS_REGION: AWS region where the SNS alerting is defined

  • AWS_PROFILE: AWS profile (mainly useful for local development when working with multiple AWS profiles)

transfer_baltrad [OPTIONS]

vph5_to_vpts

Convert and aggregate HDF5 VP files to daily and monthly VPTS CSV files on S3 bucket

Check the latest modified ODIM HDF5 bird VP profile on the Aloft S3 bucket (as generated by vol2bird and transferred using the vpts.bin.transfer_baltrad CLI routine). Using an s3 inventory bucket, check which HDF5 files were recently added and convert those files from ODIM bird profile to the VPTS CSV format. Finally, upload the generated daily/monthly VPTS files to S3.

When using the path_s3_folder option, the modified date is not used, but a recursive search within the given s3 path is applied to define the daily/monthly files to recreate. E.g. vph5_to_vpts –path-s3-folder uva/hdf5/nldhl/2019 or vph5_to_vpts –path-s3-folder baltrad/hdf5/bejab/2022/10.

Besides, while scanning the S3 inventory to define the files to convert, the CLI routine creates the coverage.csv file and uploads it to the bucket.

Configuration is loaded from the following environmental variables:

- DESTINATION_BUCKET: AWS S3 bucket to read and write data to
- INVENTORY_BUCKET: AWS S3 bucket configured as `s3 inventory bucket
- SNS_TOPIC: AWS SNS topic to report when routine fails
- AWS_REGION: AWS region where the SNS alerting is defined
- AWS_PROFILE: AWS profile (mainly useful for local development when
working with multiple AWS profiles)
vph5_to_vpts [OPTIONS]

Options

--modified-days-ago <modified_days_ago>

Range of HDF5 VP files to include, i.e. files modified between now and Nmodified-days-ago. If 0, all HDF5 files in the bucket will be included.

--path-s3-folder <path_s3_folder>

Apply the conversion to VPTS to all files within a S3 sub-folders instead of using the modified date of the files. This option does not use the inventory files.

Development instructions

See contributing for a detailed overview and set of guidelines. If familiar with tox, the setup of a development environment boils down to:

tox -e dev   # Create development environment with venv and register an ipykernel.
source venv/bin/activate  # Activate this environment to get started

Next, the following set of commands are available to support development:

tox              # Run the unit tests
tox -e docs      # Invoke sphinx-build to build the docs
tox -e format    # Run black code formatting

tox -e clean     # Remove old distribution files and temporary build artifacts (./build and ./dist)
tox -e build     # Build the package wheels and tar

tox -e linkcheck # Check for broken links in the documentation

tox -e publish   # Publish the package you have been developing to a package index server. By default, it uses testpypi. If you really want to publish your package to be publicly accessible in PyPI, use the `-- --repository pypi` option.
tox -av          # List all available tasks

To create a pinned requirements.txt set of dependencies, pip-tools is used:

pip-compile --extra transfer --resolver=backtracking`

Notes

  • This project has been set up using PyScaffold 4.3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

  • The odimh5 module was originally developed and released to pypi as a separate odimh5 package by Nicolas Noé (@niconoe). Version 0.1.0 has been included into this vptstools package.

Contents

Indices and tables