2 Extract vp data
In this chapter we’ll select, read, combine, filter and export data from vp files to a single, unaggregated CSV file.
library(bioRad)
library(dplyr)
library(loopr)
library(lubridate)
source("functions/vp_to_df.R")
source("functions/load_settings.R")
2.1 Load settings
All of the filtering options we’ll use here are defined in a yaml settings file, which we pointed to in setup.Rmd
(see 1.3). Let’s load those:
settings <- load_settings(settings_file, radars_metadata_file)
Found 3 radars from 1 countries in the settings.
2.2 Select date range & radars
Select vp files on the date range and radars we defined in our settings (file selection):
vp_file_paths <- bioRad::retrieve_vp_paths(
start_date = settings$general$start_date,
end_date = settings$general$end_date,
radar = settings$general$radar_ids_3char,
path = raw_data_dir
)
The file paths we get back only start from the data directory, so we need to append those with the path to the data directory itself:
vp_file_paths <- paste0(raw_data_dir, vp_file_paths)
There are 75 vp files that meet our criteria. Preview:
head(vp_file_paths, 5)
## [1] "../data/raw/example/seang_vp_20161003T2000Z.h5"
## [2] "../data/raw/example/seang_vp_20161003T2015Z.h5"
## [3] "../data/raw/example/seang_vp_20161003T2030Z.h5"
## [4] "../data/raw/example/seang_vp_20161003T2045Z.h5"
## [5] "../data/raw/example/seang_vp_20161003T2100Z.h5"
2.3 Read vp files
We read all those files with bioRad. This could take a while (☕ or 🛌 time):
vp_files <- bioRad::readvp.list(vp_file_paths)
2.4 Select variables
A single vp file contains a data frame with heights as rows and variables as columns (see the bird profile format specification for more information regarding the variables):
str(vp_files[[1]]$data)
## 'data.frame': 25 obs. of 16 variables:
## $ ff : num NA 13.5 12.2 10.4 11.8 ...
## $ dbz : num NA 0.518 1.162 -2.417 -2.637 ...
## $ dens : num NA 35.6 41.3 18.1 17.2 ...
## $ u : num NA -9.63 -8.46 -6.58 -6.64 ...
## $ v : num NA -9.42 -8.75 -8.01 -9.76 ...
## $ gap : num 1 0 0 0 0 0 1 1 1 1 ...
## $ w : num NA 73.14 -2.75 -28.69 7.32 ...
## $ n_dbz : num 0 1177 3367 5710 3791 ...
## $ dd : num NA 226 224 219 214 ...
## $ n : num 0 213 1044 1421 940 ...
## $ DBZH : num NA 6.973 3.865 -0.865 -2.637 ...
## $ HGHT : num 0 200 400 600 800 1000 1200 1400 1600 1800 ...
## $ n_dbz_all: num 0 14686 12063 9070 3791 ...
## $ eta : num NA 391 454 199 189 ...
## $ sd_vvp : num NA 5.06 4.99 4.64 4.52 ...
## $ n_all : num 0 4971 3973 2520 940 ...
We only need a couple of variables (HGHT
, u
, v
, dens
, dd
, ff
, DBZH
) and attributes (date_time
and radar_id
) from the vp_files (column selection). We retrieve those with the custom function vp_to_df()
after which we combine all data in one single data frame, sorted on radar_id
, date_time
, and HGHT
:
vp_data <- list()
variables = c("u", "v", "dens", "dd", "ff", "DBZH") # HGHT is added by default in vp_to_df()
for (i in seq_along(vp_files)) {
vp_data[[i]] <- vp_to_df(vp_files[[i]], variables)
# radar/datetime will be extracted from a vp attribute.
# If you want to extract it from the name of the file instead, use:
# vp_data[[i]] <- vp_to_df(vp_files[[i]], variables, basename(vp_file_paths[[i]]))
}
bind_rows(vp_data) %>%
arrange(radar_id, datetime, HGHT) -> vp_data
To uniquely reference records later, we add an id
column:
vp_data <- cbind("id" = seq.int(nrow(vp_data)), vp_data)
That data frame contains 1740 records.
2.5 Calculate MTR per height
We also want migration traffic rate (mtr) per height for further analysis, so we calculate and add it as an extra column:
# Calculate mtr: times 3.6 to convert ff from m/s to km/h, divide by 5 because 200m bins
vp_data$mtr <- vp_data$ff * 3.6 * vp_data$dens / 5
# Add our new mtr to variables
variables <- c(variables, "mtr")
Preview
head(vp_data, 20)
## id radar_id datetime HGHT u v dens
## 1 1 seang 2016-10-03 20:00:00 0 NA NA NA
## 2 2 seang 2016-10-03 20:00:00 200 -9.632832 -9.421702 35.59053
## 3 3 seang 2016-10-03 20:00:00 400 -8.461532 -8.746634 41.27831
## 4 4 seang 2016-10-03 20:00:00 600 -6.581830 -8.013961 18.10636
## 5 5 seang 2016-10-03 20:00:00 800 -6.643213 -9.764238 17.21185
## 6 6 seang 2016-10-03 20:00:00 1000 -6.028350 -10.171729 11.11173
## 7 7 seang 2016-10-03 20:00:00 1200 NA NA NA
## 8 8 seang 2016-10-03 20:00:00 1400 NA NA NA
## 9 9 seang 2016-10-03 20:00:00 1600 NA NA NA
## 10 10 seang 2016-10-03 20:00:00 1800 NA NA NA
## 11 11 seang 2016-10-03 20:00:00 2000 NA NA NA
## 12 12 seang 2016-10-03 20:00:00 2200 NA NA NA
## 13 13 seang 2016-10-03 20:00:00 2400 NA NA NA
## 14 14 seang 2016-10-03 20:00:00 2600 NA NA NA
## 15 15 seang 2016-10-03 20:00:00 2800 NA NA NA
## 16 16 seang 2016-10-03 20:00:00 3000 NA NA NA
## 17 17 seang 2016-10-03 20:00:00 3200 NA NA NA
## 18 18 seang 2016-10-03 20:00:00 3400 NA NA NA
## 19 19 seang 2016-10-03 20:00:00 3600 NA NA NA
## 20 20 seang 2016-10-03 20:00:00 3800 NA NA NA
## dd ff DBZH mtr
## 1 NA NA NA NA
## 2 225.6348 13.47442 6.9733987 345.28437
## 3 224.0508 12.16968 3.8646252 361.68759
## 4 219.3961 10.37035 -0.8646818 135.19384
## 5 214.2299 11.80985 -2.6370287 146.35399
## 6 210.6534 11.82392 -4.5374875 94.59664
## 7 NA NA NA NA
## 8 NA NA NA NA
## 9 NA NA NA NA
## 10 NA NA NA NA
## 11 NA NA NA NA
## 12 NA NA NA NA
## 13 NA NA NA NA
## 14 NA NA NA NA
## 15 NA NA NA NA
## 16 NA NA NA NA
## 17 NA NA NA NA
## 18 NA NA NA NA
## 19 NA NA NA NA
## 20 NA NA NA NA
2.6 Add day/night information
To add day/night information, we first create a sunrise/sunset dataframe for each radar/date combination, using the latitude/longitude from our radar metadata and the suntime()
function in bioRad:
# Create a simple dataframe from the settings with radar, latitude, longitude
bind_rows(lapply(settings$radars, data.frame)) %>%
select(radar_id, latitude, longitude) -> radars_lat_long
# Group vp_data by radar and date
vp_data %>%
mutate(date = as.Date(datetime)) %>%
group_by(radar_id, date) %>%
summarize() %>%
# Join with radar_lat_long to get the latitude and longitude column
left_join(radars_lat_long, by = "radar_id") %>%
# Use bioRad to add sunrise/sunset information
mutate(sunrise = bioRad::suntime(date = date, lat = latitude, lon = longitude, rise = TRUE)) %>%
mutate(sunset = bioRad::suntime(date = date, lat = latitude, lon = longitude, rise = FALSE)) -> radars_dates_sunriset
Preview
head(radars_dates_sunriset, 5)
## # A tibble: 5 x 6
## # Groups: radar_id [3]
## radar_id date latitude longitude sunrise
## <chr> <date> <dbl> <dbl> <dttm>
## 1 seang 2016-10-03 56.4 12.9 2016-10-03 05:21:06
## 2 seang 2016-10-04 56.4 12.9 2016-10-04 05:23:08
## 3 searl 2016-10-03 59.7 17.9 2016-10-03 05:04:02
## 4 searl 2016-10-04 59.7 17.9 2016-10-04 05:06:23
## 5 sease 2016-10-03 57.3 18.4 2016-10-03 04:59:47
## # ... with 1 more variable: sunset <dttm>
We then combine this sunrise/sunset information back with our vp data, to figure out if it is day or night and to which date they belong (date_of_sunset
). For the latter, we consider night timestamps between midnight and sunrise as belonging to the previous date:
vp_data %>%
# A date (not datetime) column to match with radars_sunriset
mutate(date = as.Date(datetime)) %>%
# Combine vp data with radar_sunriset information
right_join(radars_dates_sunriset, by = c("radar_id" = "radar_id", "date" = "date")) %>%
# Define nights as starting before sunrise and after sunset
mutate(day_night = if_else(
datetime < sunrise | datetime > sunset,
"night",
"day"
)) %>%
# Calculate the date of sunrise:
# For days: keep the date of the datetime
# For nights: rewind the time with 12 hours, so that night timestamps between
# midnight and sunrise are considered belonging to previous day.
mutate(date_of_sunset = if_else(
day_night == "night",
datetime - hours(12),
datetime
)) %>%
mutate(date_of_sunset = format(date_of_sunset, format = "%Y%m%d")) %>%
# Remove unneeded columns
select(-date, -latitude, -longitude, -sunrise, -sunset) -> vp_data
Preview:
head(vp_data, 20)
## id radar_id datetime HGHT u v dens
## 1 1 seang 2016-10-03 20:00:00 0 NA NA NA
## 2 2 seang 2016-10-03 20:00:00 200 -9.632832 -9.421702 35.59053
## 3 3 seang 2016-10-03 20:00:00 400 -8.461532 -8.746634 41.27831
## 4 4 seang 2016-10-03 20:00:00 600 -6.581830 -8.013961 18.10636
## 5 5 seang 2016-10-03 20:00:00 800 -6.643213 -9.764238 17.21185
## 6 6 seang 2016-10-03 20:00:00 1000 -6.028350 -10.171729 11.11173
## 7 7 seang 2016-10-03 20:00:00 1200 NA NA NA
## 8 8 seang 2016-10-03 20:00:00 1400 NA NA NA
## 9 9 seang 2016-10-03 20:00:00 1600 NA NA NA
## 10 10 seang 2016-10-03 20:00:00 1800 NA NA NA
## 11 11 seang 2016-10-03 20:00:00 2000 NA NA NA
## 12 12 seang 2016-10-03 20:00:00 2200 NA NA NA
## 13 13 seang 2016-10-03 20:00:00 2400 NA NA NA
## 14 14 seang 2016-10-03 20:00:00 2600 NA NA NA
## 15 15 seang 2016-10-03 20:00:00 2800 NA NA NA
## 16 16 seang 2016-10-03 20:00:00 3000 NA NA NA
## 17 17 seang 2016-10-03 20:00:00 3200 NA NA NA
## 18 18 seang 2016-10-03 20:00:00 3400 NA NA NA
## 19 19 seang 2016-10-03 20:00:00 3600 NA NA NA
## 20 20 seang 2016-10-03 20:00:00 3800 NA NA NA
## dd ff DBZH mtr day_night date_of_sunset
## 1 NA NA NA NA night 20161003
## 2 225.6348 13.47442 6.9733987 345.28437 night 20161003
## 3 224.0508 12.16968 3.8646252 361.68759 night 20161003
## 4 219.3961 10.37035 -0.8646818 135.19384 night 20161003
## 5 214.2299 11.80985 -2.6370287 146.35399 night 20161003
## 6 210.6534 11.82392 -4.5374875 94.59664 night 20161003
## 7 NA NA NA NA night 20161003
## 8 NA NA NA NA night 20161003
## 9 NA NA NA NA night 20161003
## 10 NA NA NA NA night 20161003
## 11 NA NA NA NA night 20161003
## 12 NA NA NA NA night 20161003
## 13 NA NA NA NA night 20161003
## 14 NA NA NA NA night 20161003
## 15 NA NA NA NA night 20161003
## 16 NA NA NA NA night 20161003
## 17 NA NA NA NA night 20161003
## 18 NA NA NA NA night 20161003
## 19 NA NA NA NA night 20161003
## 20 NA NA NA NA night 20161003
2.7 Filter out heights
For each radar we only want to select the data above 200m above ground level (AGL) (row selection). Since the heights in the data are expressed in above sea level (ASL), the height range to exclude is the altitude of the radar + 100m, which differs from radar to radar. Those height ranges are defined in our settings (see 1.3).
To do this, we take the include_heights
range for each radar2 and set the data outside that range to NA
.
for (radar in settings$radars) { # For each radar
# Create height subset for that radar
subset =
vp_data %>%
filter(
radar_id == radar$radar_id & # Specific radar
(HGHT < radar$min_height | # Below min height
HGHT > radar$max_height) # Above max height
) %>%
# Set variables to NA
mutate(u = NA, v = NA, dens = NA, dd = NA, ff = NA, mtr = NA, DBZH = NA) %>%
# Mention that the row was excluded because of height
mutate(exclusion_reason = "height")
# Insert subset back into vp_data
insert(vp_data, subset, by = "id") -> vp_data
}
Preview:
head(vp_data, 20)
## id radar_id datetime HGHT u v dens
## 1 1 seang 2016-10-03 20:00:00 0 NA NA NA
## 2 2 seang 2016-10-03 20:00:00 200 NA NA NA
## 3 3 seang 2016-10-03 20:00:00 400 -8.461532 -8.746634 41.27831
## 4 4 seang 2016-10-03 20:00:00 600 -6.581830 -8.013961 18.10636
## 5 5 seang 2016-10-03 20:00:00 800 -6.643213 -9.764238 17.21185
## 6 6 seang 2016-10-03 20:00:00 1000 -6.028350 -10.171729 11.11173
## 7 7 seang 2016-10-03 20:00:00 1200 NA NA NA
## 8 8 seang 2016-10-03 20:00:00 1400 NA NA NA
## 9 9 seang 2016-10-03 20:00:00 1600 NA NA NA
## 10 10 seang 2016-10-03 20:00:00 1800 NA NA NA
## 11 11 seang 2016-10-03 20:00:00 2000 NA NA NA
## 12 12 seang 2016-10-03 20:00:00 2200 NA NA NA
## 13 13 seang 2016-10-03 20:00:00 2400 NA NA NA
## 14 14 seang 2016-10-03 20:00:00 2600 NA NA NA
## 15 15 seang 2016-10-03 20:00:00 2800 NA NA NA
## 16 16 seang 2016-10-03 20:00:00 3000 NA NA NA
## 17 17 seang 2016-10-03 20:00:00 3200 NA NA NA
## 18 18 seang 2016-10-03 20:00:00 3400 NA NA NA
## 19 19 seang 2016-10-03 20:00:00 3600 NA NA NA
## 20 20 seang 2016-10-03 20:00:00 3800 NA NA NA
## dd ff DBZH mtr day_night date_of_sunset
## 1 NA NA NA NA night 20161003
## 2 NA NA NA NA night 20161003
## 3 224.0508 12.16968 3.8646252 361.68759 night 20161003
## 4 219.3961 10.37035 -0.8646818 135.19384 night 20161003
## 5 214.2299 11.80985 -2.6370287 146.35399 night 20161003
## 6 210.6534 11.82392 -4.5374875 94.59664 night 20161003
## 7 NA NA NA NA night 20161003
## 8 NA NA NA NA night 20161003
## 9 NA NA NA NA night 20161003
## 10 NA NA NA NA night 20161003
## 11 NA NA NA NA night 20161003
## 12 NA NA NA NA night 20161003
## 13 NA NA NA NA night 20161003
## 14 NA NA NA NA night 20161003
## 15 NA NA NA NA night 20161003
## 16 NA NA NA NA night 20161003
## 17 NA NA NA NA night 20161003
## 18 NA NA NA NA night 20161003
## 19 NA NA NA NA night 20161003
## 20 NA NA NA NA night 20161003
## exclusion_reason
## 1 height
## 2 height
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## 7 <NA>
## 8 <NA>
## 9 <NA>
## 10 <NA>
## 11 <NA>
## 12 height
## 13 height
## 14 height
## 15 height
## 16 height
## 17 height
## 18 height
## 19 height
## 20 height
2.8 Filter out datetimes
For some radars certain datetime ranges need to be excluded due to (remaining) rain clutter (row selection). Those datetime ranges were discovered during manual inspection and are defined in our settings (see 1.3).
To do this, we take the exclude_datetimes
range for each radar and set the data inside that range to NA
.
vp_data_temp <- vp_data
for (radar in settings$radars) { # For each radar
for (i in seq_along(radar$exclude_datetime_from)) { # For each exclude_datetime
# Create datetime subset for that radar
subset =
vp_data_temp %>%
filter(
radar_id == radar$radar_id & # Specific radar
between(
datetime,
radar$exclude_datetime_from[[i]], # From this datetime
radar$exclude_datetime_to[[i]] # Until this datetime
)
) %>%
# Set variables to NA
mutate(u = NA, v = NA, dens = NA, dd = NA, ff = NA, mtr = NA) %>%
# Mention that the row was excluded because of datetime
# This will overwrite any previous "height" reason
mutate(exclusion_reason = "datetime")
# Insert subset back into vp_data
insert(vp_data, subset, by = "id") -> vp_data
}
}
Preview:
head(vp_data, 20)
## id radar_id datetime HGHT u v dens dd ff DBZH mtr
## 1 1 seang 2016-10-03 20:00:00 0 NA NA NA NA NA NA NA
## 2 2 seang 2016-10-03 20:00:00 200 NA NA NA NA NA NA NA
## 3 3 seang 2016-10-03 20:00:00 400 NA NA NA NA NA 3.8646252 NA
## 4 4 seang 2016-10-03 20:00:00 600 NA NA NA NA NA -0.8646818 NA
## 5 5 seang 2016-10-03 20:00:00 800 NA NA NA NA NA -2.6370287 NA
## 6 6 seang 2016-10-03 20:00:00 1000 NA NA NA NA NA -4.5374875 NA
## 7 7 seang 2016-10-03 20:00:00 1200 NA NA NA NA NA NA NA
## 8 8 seang 2016-10-03 20:00:00 1400 NA NA NA NA NA NA NA
## 9 9 seang 2016-10-03 20:00:00 1600 NA NA NA NA NA NA NA
## 10 10 seang 2016-10-03 20:00:00 1800 NA NA NA NA NA NA NA
## 11 11 seang 2016-10-03 20:00:00 2000 NA NA NA NA NA NA NA
## 12 12 seang 2016-10-03 20:00:00 2200 NA NA NA NA NA NA NA
## 13 13 seang 2016-10-03 20:00:00 2400 NA NA NA NA NA NA NA
## 14 14 seang 2016-10-03 20:00:00 2600 NA NA NA NA NA NA NA
## 15 15 seang 2016-10-03 20:00:00 2800 NA NA NA NA NA NA NA
## 16 16 seang 2016-10-03 20:00:00 3000 NA NA NA NA NA NA NA
## 17 17 seang 2016-10-03 20:00:00 3200 NA NA NA NA NA NA NA
## 18 18 seang 2016-10-03 20:00:00 3400 NA NA NA NA NA NA NA
## 19 19 seang 2016-10-03 20:00:00 3600 NA NA NA NA NA NA NA
## 20 20 seang 2016-10-03 20:00:00 3800 NA NA NA NA NA NA NA
## day_night date_of_sunset exclusion_reason
## 1 night 20161003 datetime
## 2 night 20161003 datetime
## 3 night 20161003 datetime
## 4 night 20161003 datetime
## 5 night 20161003 datetime
## 6 night 20161003 datetime
## 7 night 20161003 datetime
## 8 night 20161003 datetime
## 9 night 20161003 datetime
## 10 night 20161003 datetime
## 11 night 20161003 datetime
## 12 night 20161003 datetime
## 13 night 20161003 datetime
## 14 night 20161003 datetime
## 15 night 20161003 datetime
## 16 night 20161003 datetime
## 17 night 20161003 datetime
## 18 night 20161003 datetime
## 19 night 20161003 datetime
## 20 night 20161003 datetime
2.9 Export to a CSV file
Finally, we export the data to a CSV file3:
# Remove id column
vp_data %>% select(-id) -> vp_data
# Write data to file (filename is dynamically created by load_settings)
write.csv(vp_data, file = paste0(processed_data_dir, project_name, "_", settings$general$vp_output_file), na = "", row.names = FALSE)
For a radar the
include_heights
range can contain amin_height
,max_height
, both or none for a radar. If the value is something other than an integer, the generalmin_height
ormax_height
is used instead.↩The data frame
vp_data
containsNaN
andNA
values, which have a different meaning. That difference gets lost in the CSV file: all are treated as blank (=NA
) values, which is fine for visualizations. If you want to keep that difference, you need work further withvp_data
.↩