vptstools package

Subpackages

Submodules

vptstools.odimh5 module

exception vptstools.odimh5.InvalidSourceODIM[source]

Bases: Exception

Wrong ODIM file

class vptstools.odimh5.ODIMReader(file_path: str)[source]

Bases: object

Read ODIM (HDF5) files with context manager

Should be used with the “with” statement (context manager) to properly close the HDF5 file.

hdf5
Type:

HDF5 file object

close() None[source]
property dataset_names: List[str]

Get a list of all the dataset elements (names, as str)

property how: dict

Get the ‘how’ as dictionary

property root_date_str: str

Get the root what.date attribute as a string, format ‘YYYYMMDD’

property root_datetime: datetime

Get the root date and time as a proper aware datetime object

property root_object_str: str

Get the root what.object attribute as a string.

Possible values according to the standard:
  • “PVOL” (Polar volume)

  • “CVOL” (Cartesian volume)

  • “SCAN” (Polar scan)

  • “RAY” (Single polar ray)

  • “AZIM” (Azimuthal object)

  • “ELEV” (Elevational object)

  • “IMAGE” (2-D cartesian image)

  • “COMP” (Cartesian composite image(s))

  • “XSEC” (2-D vertical cross section(s))

  • “VP” (1-D vertical profile)

  • “PIC” (Embedded graphical image)

property root_source: Dict[str, str]

Get the root what.source attribute as a dict.

Example: {‘WMO’:’06477’, ‘NOD’:’bewid’, ‘RAD’:’BX41’, ‘PLC’:’Wideumont’}

property root_source_str: str

Get the root what.source attribute as a string.

Example: WMO:06477,RAD:BX41,PLC:Wideumont,NOD:bewid,CTY:605,CMT:VolumeScanZ

property root_time_str: str

Get the root what.time attribute as a string, format ‘HHMMSS’ (UTC)

property what: dict

Get the ‘what’ as dictionary

property where: dict

Get the ‘where’ as dictionary

vptstools.odimh5.check_vp_odim(source_odim: ODIMReader) None[source]

Verify ODIM file is an hdf5 ODIM format containing ‘VP’ data.

vptstools.s3 module

class vptstools.s3.OdimFilePath(source: str, radar_code: str, data_type: str, year: str, month: str, day: str, hour: str = '00', minute: str = '00', file_name: str = '', file_type: str = '')[source]

Bases: object

ODIM file path with translation from/to different S3 key paths

Parameters:
  • source (str) – Data source, e.g. baltrad, ecog-04003,…

  • radar_code (str) – country + radar code

  • data_type (str) – ODIM data type, e.g. vp, pvol,…

  • year (str) – year, YYYY

  • month (str) – month, MM

  • day (str) – day, DD

  • hour (str = "00") – hour, HH

  • minute (str = "00") – minute, MM

  • file_name (str = "", optional) – File name from which the other properties were derived

  • file_type (str = "", optional) – File type from which the other properties were derived, e.g. hdf5

property country

Country code

property daily_vpts_file_name

Name of the corresponding daily VPTS file

data_type: str
day: str
file_name: str = ''
file_type: str = ''
classmethod from_file_name(h5_file_path, source)[source]

Initialize class from ODIM file path

classmethod from_inventory(h5_file_path)[source]

Initialize class from S3 inventory which contains source and file_type

classmethod from_s3fs_enlisting(h5_file_path)[source]

Initialize class from S3 inventory which contains bucket, source and file_type

hour: str = '00'
minute: str = '00'
month: str
static parse_file_name(file_name)[source]

Parse an hdf5 file name radar_code, data_type, year, month, day, hour, minute and file_name.

Parameters:

file_name (str) – File name to be parsed. An eventual parent path and extension will be removed

Return type:

radar_code, data_type, year, month, day, hour, minute, file_name

Notes

File names are expected to have the following format:

radar_type_yyyymmddThhmmextra.h5

with radar the 5-letter radar code, type the data type, yyyymmdd the date and hhmm the hours and minutes. T is optional, extra is ignored.

property radar

Radar code

radar_code: str
property s3_file_path_daily_vpts

S3 key of the daily VPTS file corresponding to the HDF5 file

property s3_file_path_monthly_vpts

S3 key of the monthly concatenated VPTS file corresponding to the HDF5 file

property s3_folder_path_h5

S3 key with the folder containing the HDF5 file

s3_path_setup(file_output)[source]

Common setup of the S3 bucket logic

s3_url_h5(bucket='aloftdata')[source]

Full S3 URL for the stored h5 file

source: str
year: str
vptstools.s3.extract_daily_group_from_inventory(file_path)[source]

Extract file name components to define a group

The coverage file counts the number of files available per group (e.g. daily files per radar). This function is passed to the Pandas groupby to translate the file path to a countable set (e.g. source, radar-code, year month and day for daily files per radar).

Parameters:

file_path (str) – File path of the ODIM HDF5 file. Only the file name is taken into account and a folder-path is ignored.

vptstools.s3.extract_daily_group_from_path(file_path)[source]

Extract file name components to define a group

The coverage file counts the number of files available per group (e.g. daily files per radar). This function is passed to the Pandas groupby to translate the file path to a countable set (e.g. source, radar-code, year month and day for daily files per radar).

Parameters:

file_path (str) – File path of the ODIM HDF5 file. Only the file name is taken into account and a folder-path is ignored.

vptstools.s3.handle_manifest(manifest_url, modified_days_ago='2day', storage_options=None)[source]

Extract modified days and coverage from a manifest file

Parameters:
  • manifest_url (str) – URL of the S3 inventory manifest file to use; s3://…

  • modified_days_ago (str, default '2day') – Time period to check for ‘modified date’ to extract the subset of files that should trigger a rerun.

  • storage_options (dict, optional) – Additional parameters passed to the read_csv to access the S3 manifest files, eg. custom AWS profile options ({“profile”: “inbo-prd”})

Returns:

  • df_cov (pandas.DataFrame) – DataFrame with the ‘directory’ info (source, radar_code, year, month, day) and the number of files in the S3 bucket.

  • df_days_to_create_vpts (pandas.DataFrame) – DataFrame with the ‘directory’ info (source, radar_code, year, month, day) and the number of new files within the look back period.

Notes

Check https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html for more information on S3 bucket inventory and manifest files.

vptstools.s3.list_manifest_file_keys(s3_manifest_url, storage_options=None)[source]

Enlist the manifest individual files

Parameters:
  • s3_manifest_url (str) – S3 URL to manifest file

  • storage_options (dict, optional) – Additional parameters passed to the read_csv to access the S3 manifest files, eg. custom AWS profile options ({“profile”: “inbo-prd”})

vptstools.vpts module

class vptstools.vpts.BirdProfile(identifiers: dict, datetime: datetime, what: dict, where: dict, how: dict, levels: List[int], variables: dict, source_file: str = '')[source]

Bases: object

Represent ODIM source file

Data class representing a single input ODIM source file, i.e. (https://github.com/adokter/vol2bird/wiki/ODIM-bird-profile-format-specification) single datetime, single radar, multiple altitudes with variables for each altitude: dd, ff, …

This object aims to stay as close as possible to the HDF5 file (no data simplification/loss at this stage). Use the from_odim method as a convenient instantiation.

datetime: datetime
classmethod from_odim(source_odim: ODIMReader, source_file=None)[source]

Extract BirdProfile information from ODIM with OdimReader

Parameters:
  • source_odim (ODIMReader) – ODIM file reader interface.

  • source_file (str, optional) – URL or path to the source file from which the data were derived.

how: dict
identifiers: dict
levels: List[int]
source_file: str = ''
to_vp(vpts_csv_version)[source]

Convert profile data to a CSV

Parameters:

vpts_csv_version (AbstractVptsCsv) – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0

Notes

When ‘NaN’ or ‘NA’ values are present inside a column, the object data type will be kept. Otherwise the difference between NaN and NA would be lost and this overcomes int to float conversion when Nans are available as no int NaN is supported by Pandas.

variables: dict
what: dict
where: dict
vptstools.vpts.validate_vpts(df, schema_version='v1.0')[source]

Validate VPTS DataFrame against the frictionless data schema and return report

Parameters:
Returns:

Frictionless validation report

Return type:

dict

vptstools.vpts.vp(file_path, vpts_csv_version='v1.0', source_file='')[source]

Convert ODIM HDF5 file to a DataFrame

Parameters:
  • file_path (Path) – File Path of ODIM HDF5

  • vpts_csv_version (str, default "") – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0

  • source_file (str | callable) – URL or path to the source file from which the data were derived or a callable that converts the file_path to the source_file. See https://aloftdata.eu/vpts-csv/#source_file for more information on the source file field.

Examples

>>> file_path = Path("bejab_vp_20221111T233000Z_0x9.h5")
>>> vp(file_path)
>>> vp(file_path,
...    source_file="s3://aloftdata/baltrad/hdf5/2022/11/11/bejab_vp_20221111T233000Z_0x9.h5")  #noqa

Use file name itself as source_file representation in VP file using a custom callable function

>>> vp(file_path, source_file=lambda x: Path(x).name)
vptstools.vpts.vpts(file_paths, vpts_csv_version='v1.0', source_file=None)[source]

Convert set of HDF5 files to a DataFrame all as string

Parameters:
  • file_paths (Iterable of file paths) – Iterable of ODIM HDF5 file paths

  • vpts_csv_version (str) – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0

  • source_file (callable, optional) – A callable that converts the file_path to the source_file. When None, the file name itself (without parent folder reference) is used.

Notes

Due tot the multiprocessing support, the source_file as a callable can not be a anonymous lambda function.

Examples

>>> file_paths = sorted(Path("../data/raw/baltrad/").rglob("*.h5"))
>>> vpts(file_paths)

Use file name itself as source_file representation in VP file using a custom callable function

>>> def path_to_source(file_path):
...     return Path(file_path).name
>>> vpts(file_paths, source_file=path_to_source)
vptstools.vpts.vpts_to_csv(df, file_path)[source]

Write VP or VPTS to file

Parameters:
  • df (pandas.DataFrame) – DataFrame with VP or VPTS data

  • file_path (Path | str) – File path to store the VPTS file

vptstools.vpts_csv module

class vptstools.vpts_csv.AbstractVptsCsv[source]

Bases: ABC

Abstract class to define VPTS CSV conversion rules with a certain version

abstract mapping(bird_profile) dict[source]

Translation from ODIM bird profile to VPTS CSV data format.

Data columns can be derived from the different attributes of the bird profile:

  • identifiers: radar identification metadata

  • datetime: the timestamp

  • levels: the heights or levels of the measurement

  • variables: the variables in the data (e.g. dd, ff, u,…)

  • how: ODIM5 metadata

  • where: ODIM5 metadata

  • what: ODIM5 metadata

An example of the dict to return:

dict(
    radar=bird_profile.identifiers["NOD"],
    height=bird_profile.levels,
    u=bird_profile.variables["u"],
    v=bird_profile.variables["v"],
    vcp=int(bird_profile.how["vcp"])
)

As data is extracted as such additional helper functions can be added as well, e.g.:

...
datetime=datetime_to_proper8601(bird_profile.datetime),
gap=number_to_bool_str(bird_profile.variables["gap"]),
radar_latitude=np.round(bird_profile.where["lat"], 6)
...

Notes

The order of the variables matter, as this defines the column order.

abstract property nodata: str

‘No data’ representation

abstract property sort: dict

Columns to define row order

The dict need to provide the column name in combination with the data type to use for the sorting, e.g.:

dict(radar=str, datetime=str, height=int, source_file=str)

As the data is returned as strings, casting to the data is done before sorting, after which the casting to str is applied again.

source_file_regex = '.*'
abstract property undetect: str

‘Undetect’ representation

class vptstools.vpts_csv.VptsCsvV1[source]

Bases: AbstractVptsCsv

mapping(bird_profile)[source]

Translation from ODIM bird profile to VPTS CSV data format.

Notes

The order of the variables matter, as this defines the column order.

property nodata: str

‘No data’ representation

property sort: dict

Columns to define row order

source_file_regex = '^(?=^[^.\\/~])(^((?!\\.{2}).)*$).*$'
property undetect: str

‘Undetect’ representation

exception vptstools.vpts_csv.VptsCsvVersionError[source]

Bases: Exception

Raised when non supported VPTS CSV version is asked

vptstools.vpts_csv.check_source_file(source_file, regex)[source]

Raise Exception when the source_file str is not according to the regex

Parameters:
  • source_file (str) – URL or path to the source file from which the data were derived.

  • regex (str) – Regular expression to test the source_file against

Returns:

source_file

Return type:

str

:raises ValueError : source_file not according to regex:

Examples

>>> check_source_file("s3://aloftdata/baltrad/2023/01/01/"
...                   "bejab_vp_20230101T000500Z_0x9.h5",
...                   r".*h5")
's3://aloftdata/baltrad/2023/01/01/bejab_vp_20230101T000500Z_0x9.h5'
vptstools.vpts_csv.datetime_to_proper8601(timestamp)[source]

Convert datetime to ISO8601 standard

Parameters:

timestamp (datetime.datetime) – datetime to represent to ISO8601 standard.

Notes

See https://stackoverflow.com/questions/19654578/python-utc-datetime- objects-iso-format-doesnt-include-z-zulu-or-zero-offset

Examples

>>> from datetime import datetime
>>> datetime_to_proper8601(datetime(2021, 1, 1, 4, 0))
'2021-01-01T04:00:00Z'
vptstools.vpts_csv.get_vpts_version(version: str)[source]

Link version ID (v1, v2,..) with correct AbstractVptsCsv child class

Parameters:

version (str) – e.g. v1.0, v2.0,…

Returns:

VptsCsvVx

Return type:

child class of the AbstractVptsCsv

:raises VptsCsvVersionError : Version of the VPTS CSV is not supported by an implementation:

vptstools.vpts_csv.int_to_nodata(value, nodata_values, nodata='')[source]

Convert str to either integer or the corresponding nodata value if enlisted

Parameters:
  • value (str) – Single data value

  • nodata_values (list of str) – List of values in which case the data point need to be converted to nodata

  • nodata (str | float, default "") – Data value to use when incoming value is one of the nodata_values

Return type:

str | int

Examples

>>> int_to_nodata("0", ["0", 'NULL'], nodata="")
''
>>> int_to_nodata("12", ["0", 'NULL'], nodata="")
12
>>> int_to_nodata('NULL', ["0", 'NULL'], nodata="")
''
""
vptstools.vpts_csv.number_to_bool_str(values)[source]

Convert list of boolean values to str versions with capital letters

Parameters:

values (list of bool) – List of Boolean values

Return type:

list of str [TRUE, FALSE,…]

Examples

>>> number_to_bool_str([True, False, False])
['TRUE', 'FALSE', 'FALSE']

Module contents

vptstools.validate_vpts(df, schema_version='v1.0')[source]

Validate VPTS DataFrame against the frictionless data schema and return report

Parameters:
Returns:

Frictionless validation report

Return type:

dict

vptstools.vp(file_path, vpts_csv_version='v1.0', source_file='')[source]

Convert ODIM HDF5 file to a DataFrame

Parameters:
  • file_path (Path) – File Path of ODIM HDF5

  • vpts_csv_version (str, default "") – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0

  • source_file (str | callable) – URL or path to the source file from which the data were derived or a callable that converts the file_path to the source_file. See https://aloftdata.eu/vpts-csv/#source_file for more information on the source file field.

Examples

>>> file_path = Path("bejab_vp_20221111T233000Z_0x9.h5")
>>> vp(file_path)
>>> vp(file_path,
...    source_file="s3://aloftdata/baltrad/hdf5/2022/11/11/bejab_vp_20221111T233000Z_0x9.h5")  #noqa

Use file name itself as source_file representation in VP file using a custom callable function

>>> vp(file_path, source_file=lambda x: Path(x).name)
vptstools.vpts(file_paths, vpts_csv_version='v1.0', source_file=None)[source]

Convert set of HDF5 files to a DataFrame all as string

Parameters:
  • file_paths (Iterable of file paths) – Iterable of ODIM HDF5 file paths

  • vpts_csv_version (str) – Ruleset with the VPTS CSV ruleset to use, e.g. v1.0

  • source_file (callable, optional) – A callable that converts the file_path to the source_file. When None, the file name itself (without parent folder reference) is used.

Notes

Due tot the multiprocessing support, the source_file as a callable can not be a anonymous lambda function.

Examples

>>> file_paths = sorted(Path("../data/raw/baltrad/").rglob("*.h5"))
>>> vpts(file_paths)

Use file name itself as source_file representation in VP file using a custom callable function

>>> def path_to_source(file_path):
...     return Path(file_path).name
>>> vpts(file_paths, source_file=path_to_source)
vptstools.vpts_to_csv(df, file_path)[source]

Write VP or VPTS to file

Parameters:
  • df (pandas.DataFrame) – DataFrame with VP or VPTS data

  • file_path (Path | str) – File path to store the VPTS file