vptstools
vptstools is a Python library to transfer and convert VPTS data. VPTS (vertical profile time series) express the density, speed and direction of biological signals such as birds, bats and insects within a weather radar volume, grouped into altitude layers (height) and measured over time (datetime).
Installation
Python 3.9+ is required. It is advised to use a virtual environment to install a set of dependencies for a project.
First, create a virtual environment from the command prompt (terminal):
# for windows
run python -m venv <PATH-TO-VENV>
# for linux
python -m venv <PATH-TO-VENV>
Next, activate the created environment:
# for windows
<PATH-TO-VENV>\Scripts\activate
# for linux
source <PATH-TO-VENV>/bin/activate
Once created and activated, install the package inside the virtual environment:
pip install vptstools
If you need the tools/services to transfer data (SFTP, S3) install these additional dependencies:
pip install vptstools\[transfer\]
Usage
As a library user interested in working with ODIM HDF5 and VPTS files, the most important functions provided by the package are vptstools.vpts.vp()
, vptstools.vpts.vpts()
and vptstools.vpts.vpts_to_csv()
, which can be used respectively to convert a single HDF5 file, a set of HDF5 files and save a VPTS DataFrame to a CSV file:
Convert a single local ODIM HDF5 file to a VP DataFrame:
from vptstools.vpts import vp
# Download https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/hdf5/nldbl/2013/11/23/nldbl_vp_20131123T0000Z.h5
file_path_h5 = "./nldbl_vp_20131123T0000Z.h5"
df_vp = vp(file_path_h5)
Convert a set of locally stored ODIM HDF5 files to a VPTS DataFrame:
from pathlib import Path
from vptstools.vpts import vpts
# Download files to data directory from e.g. https://aloftdata.eu/browse/?prefix=baltrad/hdf5/nldbl/2013/11/23/
file_paths = sorted(Path("./data").rglob("*.h5")) # Get all HDF5 files within the data directory
df_vpts = vpts(file_paths)
Store a VP or VPTS DataFrame to a VPTS CSV file:
from vptstools.vpts import vpts_to_csv
vpts_to_csv(df_vpts, "vpts.csv")
Note
Both vptstools.vpts.vp()
and vptstools.vpts.vpts()
have 2 other optional parameters related to the VPTS CSV data exchange format. The vpts_csv_version
parameter defines the version of the VPTS CSV data exchange standard (default v1.0) whereas the source_file
provides a way to define a custom source_file field to reference the source from which the data were derived.
To validate a VPTS DataFrame against the frictionless data schema as defined by the VPTS CSV data exchange format and return a report, use the vptstools.vpts.validate_vpts()
:
from vptstools.vpts import validate_vpts
report = validate_vpts(df_vpts, schema_version="v1.0")
report.stats["errors"]
Other modules in the package are:
vptstools.odimh5
: This module extents the implementation of the original odimh5 package which is now deprecated.vptstools.vpts_csv
: This module contains - for each version of the VPTS CSV exchange format - the corresponding implementation which can be used to generate a VP or VPTS DataFrame. For more information on how to support a new version of the VPTS CSV format, see contributing docs.vptstools.s3
: This module contains the functions to manage the Aloft data repository S3 bucket.
CLI endpoints
In addition to using functions in Python scripts, two vptstools routines are available to be called from the command line after installing the package:
transfer_baltrad
Sync files from Baltrad FTP server to the Aloft S3 bucket.
This function connects via SFTP to the BALTRAD server, downloads the available VP files (PVOL gets ignored), from the FTP server and upload the HDF5 file to the Aloft S3 bucket according to the defined folder path name convention. Existing files are ignored.
Designed to be executed via a simple scheduled job like cron or scheduled cloud function. Remark that files disappear after a few days on the BALTRAD server.
Configuration is loaded from the following environmental variables:
FTP_HOST
: Baltrad FTP host ip addressFTP_PORT
: Baltrad FTP host portFTP_USERNAME
: Baltrad FTP user nameFTP_PWD
: Baltrad FTP passwordFTP_DATADIR
: Baltrad FTP directory to load data files fromDESTINATION_BUCKET
: AWS S3 bucket to write data toSNS_TOPIC
: AWS SNS topic to report when routine failsAWS_REGION
: AWS region where the SNS alerting is definedAWS_PROFILE
: AWS profile (mainly useful for local development when working with multiple AWS profiles)
transfer_baltrad [OPTIONS]
vph5_to_vpts
Convert and aggregate HDF5 VP files to daily and monthly VPTS CSV files on S3 bucket
Check the latest modified
ODIM HDF5 bird VP profile on the
Aloft S3 bucket (as generated by vol2bird and transferred using the
vpts.bin.transfer_baltrad
CLI routine). Using an
s3 inventory bucket, check which
HDF5 files were recently added and convert those files from ODIM bird profile to the
VPTS CSV format. Finally, upload the generated daily/monthly VPTS files to S3.
When using the path_s3_folder option, the modified date is not used, but a recursive search within the given s3 path is applied to define the daily/monthly files to recreate. E.g. vph5_to_vpts –path-s3-folder uva/hdf5/nldhl/2019 or vph5_to_vpts –path-s3-folder baltrad/hdf5/bejab/2022/10.
Besides, while scanning the S3 inventory to define the files to convert,
the CLI routine creates the coverage.csv
file and uploads it to the bucket.
Configuration is loaded from the following environmental variables:
DESTINATION_BUCKET
: AWS S3 bucket to read and write data toINVENTORY_BUCKET
: AWS S3 bucket configured as `s3 inventory bucketSNS_TOPIC
: AWS SNS topic to report when routine failsAWS_REGION
: AWS region where the SNS alerting is definedAWS_PROFILE
: AWS profile (mainly useful for local development whenvph5_to_vpts [OPTIONS]
Options
- --modified-days-ago <modified_days_ago>
Range of HDF5 VP files to include, i.e. files modified between now and Nmodified-days-ago. If 0, all HDF5 files in the bucket will be included.
- --path-s3-folder <path_s3_folder>
Apply the conversion to VPTS to all files within a S3 sub-folders instead of using the modified date of the files. This option does not use the inventory files.
Development instructions
See contributing for a detailed overview and set of guidelines. If familiar with tox
, the setup of a development environment boils down to:
tox -e dev # Create development environment with venv and register an ipykernel.
source venv/bin/activate # Activate this environment to get started
Next, the following set of commands are available to support development:
tox # Run the unit tests
tox -e docs # Invoke sphinx-build to build the docs
tox -e format # Run black code formatting
tox -e clean # Remove old distribution files and temporary build artifacts (./build and ./dist)
tox -e build # Build the package wheels and tar
tox -e linkcheck # Check for broken links in the documentation
tox -e publish # Publish the package you have been developing to a package index server. By default, it uses testpypi. If you really want to publish your package to be publicly accessible in PyPI, use the `-- --repository pypi` option.
tox -av # List all available tasks
To create a pinned requirements.txt
set of dependencies, pip-tools is used:
pip-compile --extra transfer --resolver=backtracking`
Notes
This project has been set up using PyScaffold 4.3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.
The
odimh5
module was originally developed and released to pypi as a separateodimh5
package by Nicolas Noé (@niconoe). Version 0.1.0 has been included into this vptstools package.