vptstools package

Subpackages

vptstools.bin namespace

Submodules

vptstools.odimh5 module

exception vptstools.odimh5.InvalidSourceODIM[source]

Bases: Exception

Wrong ODIM file

class vptstools.odimh5.ODIMReader(file_path: str)[source]

Bases: object

Read ODIM (HDF5) files with context manager

Should be used with the “with” statement (context manager) to properly close the HDF5 file.

hdf5

Type:: HDF5 file object

close() → None[source]

property dataset_names: List[str]: Get a list of all the dataset elements (names, as str)

property how: dict: Get the ‘how’ as dictionary

property root_date_str: str: Get the root what.date attribute as a string, format ‘YYYYMMDD’

property root_datetime: datetime: Get the root date and time as a proper aware datetime object

property root_object_str: str

Get the root what.object attribute as a string.

Possible values according to the standard:

“PVOL” (Polar volume)
“CVOL” (Cartesian volume)
“SCAN” (Polar scan)
“RAY” (Single polar ray)
“AZIM” (Azimuthal object)
“ELEV” (Elevational object)
“IMAGE” (2-D cartesian image)
“COMP” (Cartesian composite image(s))
“XSEC” (2-D vertical cross section(s))
“VP” (1-D vertical profile)
“PIC” (Embedded graphical image)

property root_source: Dict[str, str]

Get the root what.source attribute as a dict.

Example: {‘WMO’:’06477’, ‘NOD’:’bewid’, ‘RAD’:’BX41’, ‘PLC’:’Wideumont’}

property root_source_str: str

Get the root what.source attribute as a string.

Example: WMO:06477,RAD:BX41,PLC:Wideumont,NOD:bewid,CTY:605,CMT:VolumeScanZ

property root_time_str: str: Get the root what.time attribute as a string, format ‘HHMMSS’ (UTC)

property what: dict: Get the ‘what’ as dictionary

property where: dict: Get the ‘where’ as dictionary

vptstools.odimh5.check_vp_odim(source_odim: ODIMReader) → None[source]: Verify ODIM file is an hdf5 ODIM format containing ‘VP’ data.

vptstools.s3 module

class vptstools.s3.OdimFilePath(source: str, radar_code: str, data_type: str, year: str, month: str, day: str, hour: str = '00', minute: str = '00', file_name: str = '', file_type: str = '')[source]

Bases: object

ODIM file path with translation from/to different S3 key paths

Parameters:

source (str) – Data source, e.g. baltrad, ecog-04003,…
radar_code (str) – country + radar code
data_type (str) – ODIM data type, e.g. vp, pvol,…
year (str) – year, YYYY
month (str) – month, MM
day (str) – day, DD
hour (str = "00") – hour, HH
minute (str = "00") – minute, MM
file_name (str = "", optional) – File name from which the other properties were derived
file_type (str = "", optional) – File type from which the other properties were derived, e.g. hdf5

property country: Country code

property daily_vpts_file_name: Name of the corresponding daily VPTS file

data_type: str

day: str

file_name: str = ''

file_type: str = ''

classmethod from_file_name(h5_file_path, source)[source]: Initialize class from ODIM file path

classmethod from_inventory(h5_file_path)[source]: Initialize class from S3 inventory which contains source and file_type

classmethod from_s3fs_enlisting(h5_file_path)[source]: Initialize class from S3 inventory which contains bucket, source and file_type

hour: str = '00'

minute: str = '00'

month: str

static parse_file_name(file_name)[source]

Parse an hdf5 file name radar_code, data_type, year, month, day, hour, minute and file_name.

Parameters:: file_name (str) – File name to be parsed. An eventual parent path and extension will be removed
Return type:: radar_code, data_type, year, month, day, hour, minute, file_name

Notes

File names are expected to have the following format:

radar_type_yyyymmddThhmmextra.h5

with radar the 5-letter radar code, type the data type, yyyymmdd the date and hhmm the hours and minutes. T is optional, extra is ignored.

property radar: Radar code

radar_code: str

property s3_file_path_daily_vpts: S3 key of the daily VPTS file corresponding to the HDF5 file

property s3_file_path_monthly_vpts: S3 key of the monthly concatenated VPTS file corresponding to the HDF5 file

property s3_folder_path_h5: S3 key with the folder containing the HDF5 file

s3_path_setup(file_output)[source]: Common setup of the S3 bucket logic

s3_url_h5(bucket='aloftdata')[source]: Full S3 URL for the stored h5 file

source: str

year: str

vptstools.s3.extract_daily_group_from_inventory(file_path)[source]

Extract file name components to define a group

The coverage file counts the number of files available per group (e.g. daily files per radar). This function is passed to the Pandas groupby to translate the file path to a countable set (e.g. source, radar-code, year month and day for daily files per radar).

Parameters:: file_path (str) – File path of the ODIM HDF5 file. Only the file name is taken into account and a folder-path is ignored.

vptstools.s3.extract_daily_group_from_path(file_path)[source]

Extract file name components to define a group

The coverage file counts the number of files available per group (e.g. daily files per radar). This function is passed to the Pandas groupby to translate the file path to a countable set (e.g. source, radar-code, year month and day for daily files per radar).

Parameters:: file_path (str) – File path of the ODIM HDF5 file. Only the file name is taken into account and a folder-path is ignored.

vptstools.s3.handle_manifest(manifest_url, modified_days_ago='2day', storage_options=None)[source]

Extract modified days and coverage from a manifest file

Parameters:

manifest_url (str) – URL of the S3 inventory manifest file to use; s3://…
modified_days_ago (str, default '2day') – Time period to check for ‘modified date’ to extract the subset of files that should trigger a rerun.
storage_options (dict, optional) – Additional parameters passed to the read_csv to access the S3 manifest files, eg. custom AWS profile options ({“profile”: “inbo-prd”})

Returns:

df_cov (pandas.DataFrame) – DataFrame with the ‘directory’ info (source, radar_code, year, month, day) and the number of files in the S3 bucket.
df_days_to_create_vpts (pandas.DataFrame) – DataFrame with the ‘directory’ info (source, radar_code, year, month, day) and the number of new files within the look back period.

Notes

Check https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html for more information on S3 bucket inventory and manifest files.

vptstools.s3.list_manifest_file_keys(s3_manifest_url, storage_options=None)[source]

Enlist the manifest individual files

Parameters:

s3_manifest_url (str) – S3 URL to manifest file
storage_options (dict, optional) – Additional parameters passed to the read_csv to access the S3 manifest files, eg. custom AWS profile options ({“profile”: “inbo-prd”})

vptstools.vpts module

class vptstools.vpts.BirdProfile(identifiers: dict, datetime: datetime, what: dict, where: dict, how: dict, levels: List[int], variables: dict, source_file: str = '')[source]

Bases: object

Represent ODIM source file

Data class representing a single input ODIM source file, i.e. (https://github.com/adokter/vol2bird/wiki/ODIM-bird-profile-format-specification) single datetime, single radar, multiple altitudes with variables for each altitude: dd, ff, …

This object aims to stay as close as possible to the HDF5 file (no data simplification/loss at this stage). Use the from_odim method as a convenient instantiation.

datetime: datetime

classmethod from_odim(source_odim: ODIMReader, source_file=None)[source]

Extract BirdProfile information from ODIM with OdimReader

Parameters:

source_odim (ODIMReader) – ODIM file reader interface.
source_file (str, optional) – URL or path to the source file from which the data were derived.

how: dict

identifiers: dict

levels: List[int]

source_file: str = ''

to_vp(vpts_csv_version)[source]

Convert profile data to a CSV

Parameters:: vpts_csv_version (AbstractVptsCsv) – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0

Notes

When ‘NaN’ or ‘NA’ values are present inside a column, the object data type will be kept. Otherwise the difference between NaN and NA would be lost and this overcomes int to float conversion when Nans are available as no int NaN is supported by Pandas.

variables: dict

what: dict

where: dict

vptstools.vpts.validate_vpts(df, schema_version='v1.0')[source]

Validate VPTS DataFrame against the frictionless data schema and return report

Parameters:

df (pandas.DataFrame) – DataFrame as created by the vp or vpts functions
schema_version (str, v1.0,...) – Version according to a release tag of https://github.com/enram/vpts-csv/tags

Returns:

Frictionless validation report

Return type:

dict

vptstools.vpts.vp(file_path, vpts_csv_version='v1.0', source_file='')[source]

Convert ODIM HDF5 file to a DataFrame

Parameters:

file_path (Path) – File Path of ODIM HDF5
vpts_csv_version (str, default "") – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0
source_file (str | callable) – URL or path to the source file from which the data were derived or a callable that converts the file_path to the source_file. See https://aloftdata.eu/vpts-csv/#source_file for more information on the source file field.

Examples

>>> file_path = Path("bejab_vp_20221111T233000Z_0x9.h5")
>>> vp(file_path)
>>> vp(file_path,
...    source_file="s3://aloftdata/baltrad/hdf5/2022/11/11/bejab_vp_20221111T233000Z_0x9.h5")  #noqa

Use file name itself as source_file representation in VP file using a custom callable function

>>> vp(file_path, source_file=lambda x: Path(x).name)

vptstools.vpts.vpts(file_paths, vpts_csv_version='v1.0', source_file=None)[source]

Convert set of HDF5 files to a DataFrame all as string

Parameters:

file_paths (Iterable of file paths) – Iterable of ODIM HDF5 file paths
vpts_csv_version (str) – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0
source_file (callable, optional) – A callable that converts the file_path to the source_file. When None, the file name itself (without parent folder reference) is used.

Notes

Due tot the multiprocessing support, the source_file as a callable can not be a anonymous lambda function.

Examples

>>> file_paths = sorted(Path("../data/raw/baltrad/").rglob("*.h5"))
>>> vpts(file_paths)

Use file name itself as source_file representation in VP file using a custom callable function

>>> def path_to_source(file_path):
...     return Path(file_path).name
>>> vpts(file_paths, source_file=path_to_source)

vptstools.vpts.vpts_to_csv(df, file_path)[source]

Write VP or VPTS to file

Parameters:

df (pandas.DataFrame) – DataFrame with VP or VPTS data
file_path (Path | str) – File path to store the VPTS file

vptstools.vpts_csv module

class vptstools.vpts_csv.AbstractVptsCsv[source]

Bases: ABC

Abstract class to define VPTS CSV conversion rules with a certain version

abstract mapping(bird_profile) → dict[source]

Translation from ODIM bird profile to VPTS CSV data format.

Data columns can be derived from the different attributes of the bird profile:

identifiers: radar identification metadata

datetime: the timestamp

levels: the heights or levels of the measurement

variables: the variables in the data (e.g. dd, ff, u,…)

how: ODIM5 metadata

where: ODIM5 metadata

what: ODIM5 metadata

An example of the dict to return:

dict(
    radar=bird_profile.identifiers["NOD"],
    height=bird_profile.levels,
    u=bird_profile.variables["u"],
    v=bird_profile.variables["v"],
    vcp=int(bird_profile.how["vcp"])
)

As data is extracted as such additional helper functions can be added as well, e.g.:

...
datetime=datetime_to_proper8601(bird_profile.datetime),
gap=number_to_bool_str(bird_profile.variables["gap"]),
radar_latitude=np.round(bird_profile.where["lat"], 6)
...

Notes

The order of the variables matter, as this defines the column order.

abstract property nodata: str: ‘No data’ representation

abstract property sort: dict

Columns to define row order

The dict need to provide the column name in combination with the data type to use for the sorting, e.g.:

dict(radar=str, datetime=str, height=int, source_file=str)

As the data is returned as strings, casting to the data is done before sorting, after which the casting to str is applied again.

source_file_regex = '.*'

abstract property undetect: str: ‘Undetect’ representation

class vptstools.vpts_csv.VptsCsvV1[source]

Bases: AbstractVptsCsv

mapping(bird_profile)[source]

Translation from ODIM bird profile to VPTS CSV data format.

Notes

The order of the variables matter, as this defines the column order.

property nodata: str: ‘No data’ representation

property sort: dict: Columns to define row order

source_file_regex = '^(?=^[^.\\/~])(^((?!\\.{2}).)*$).*$'

property undetect: str: ‘Undetect’ representation

exception vptstools.vpts_csv.VptsCsvVersionError[source]

Bases: Exception

Raised when non supported VPTS CSV version is asked

vptstools.vpts_csv.check_source_file(source_file, regex)[source]

Raise Exception when the source_file str is not according to the regex

Parameters:

source_file (str) – URL or path to the source file from which the data were derived.
regex (str) – Regular expression to test the source_file against

Returns:

source_file

Return type:

str

:raises ValueError : source_file not according to regex:

Examples

>>> check_source_file("s3://aloftdata/baltrad/2023/01/01/"
...                   "bejab_vp_20230101T000500Z_0x9.h5",
...                   r".*h5")
's3://aloftdata/baltrad/2023/01/01/bejab_vp_20230101T000500Z_0x9.h5'

vptstools.vpts_csv.datetime_to_proper8601(timestamp)[source]

Convert datetime to ISO8601 standard

Parameters:: timestamp (datetime.datetime) – datetime to represent to ISO8601 standard.

Notes

See https://stackoverflow.com/questions/19654578/python-utc-datetime- objects-iso-format-doesnt-include-z-zulu-or-zero-offset

Examples

>>> from datetime import datetime
>>> datetime_to_proper8601(datetime(2021, 1, 1, 4, 0))
'2021-01-01T04:00:00Z'

vptstools.vpts_csv.get_vpts_version(version: str)[source]

Link version ID (v1, v2,..) with correct AbstractVptsCsv child class

Parameters:: version (str) – e.g. v1.0, v2.0,…
Returns:: VptsCsvVx
Return type:: child class of the AbstractVptsCsv

:raises VptsCsvVersionError : Version of the VPTS CSV is not supported by an implementation:

vptstools.vpts_csv.int_to_nodata(value, nodata_values, nodata='')[source]

Convert str to either integer or the corresponding nodata value if enlisted

Parameters:

value (str) – Single data value
nodata_values (list of str) – List of values in which case the data point need to be converted to nodata
nodata (str | float, default "") – Data value to use when incoming value is one of the nodata_values

Return type:

str | int

Examples

>>> int_to_nodata("0", ["0", 'NULL'], nodata="")
''
>>> int_to_nodata("12", ["0", 'NULL'], nodata="")
12
>>> int_to_nodata('NULL', ["0", 'NULL'], nodata="")
''
""

vptstools.vpts_csv.number_to_bool_str(values)[source]

Convert list of boolean values to str versions with capital letters

Parameters:: values (list of bool) – List of Boolean values
Return type:: list of str [TRUE, FALSE,…]

Examples

>>> number_to_bool_str([True, False, False])
['TRUE', 'FALSE', 'FALSE']

Module contents

vptstools.validate_vpts(df, schema_version='v1.0')[source]

Validate VPTS DataFrame against the frictionless data schema and return report

Parameters:

df (pandas.DataFrame) – DataFrame as created by the vp or vpts functions
schema_version (str, v1.0,...) – Version according to a release tag of https://github.com/enram/vpts-csv/tags

Returns:

Frictionless validation report

Return type:

dict

vptstools.vp(file_path, vpts_csv_version='v1.0', source_file='')[source]

Convert ODIM HDF5 file to a DataFrame

Parameters:

file_path (Path) – File Path of ODIM HDF5
vpts_csv_version (str, default "") – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0
source_file (str | callable) – URL or path to the source file from which the data were derived or a callable that converts the file_path to the source_file. See https://aloftdata.eu/vpts-csv/#source_file for more information on the source file field.

Examples

>>> file_path = Path("bejab_vp_20221111T233000Z_0x9.h5")
>>> vp(file_path)
>>> vp(file_path,
...    source_file="s3://aloftdata/baltrad/hdf5/2022/11/11/bejab_vp_20221111T233000Z_0x9.h5")  #noqa

Use file name itself as source_file representation in VP file using a custom callable function

>>> vp(file_path, source_file=lambda x: Path(x).name)

vptstools.vpts(file_paths, vpts_csv_version='v1.0', source_file=None)[source]

Convert set of HDF5 files to a DataFrame all as string

Parameters:

file_paths (Iterable of file paths) – Iterable of ODIM HDF5 file paths
vpts_csv_version (str) – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0
source_file (callable, optional) – A callable that converts the file_path to the source_file. When None, the file name itself (without parent folder reference) is used.

Notes

Due tot the multiprocessing support, the source_file as a callable can not be a anonymous lambda function.

Examples

>>> file_paths = sorted(Path("../data/raw/baltrad/").rglob("*.h5"))
>>> vpts(file_paths)

Use file name itself as source_file representation in VP file using a custom callable function

>>> def path_to_source(file_path):
...     return Path(file_path).name
>>> vpts(file_paths, source_file=path_to_source)

vptstools.vpts_to_csv(df, file_path)[source]

Write VP or VPTS to file

Parameters:

df (pandas.DataFrame) – DataFrame with VP or VPTS data
file_path (Path | str) – File path to store the VPTS file