2 Extract vp data

In this chapter we’ll select, read, combine, filter and export data from vp files to a single, unaggregated CSV file.

library(bioRad)
library(dplyr)
library(loopr)
library(lubridate)
source("functions/vp_to_df.R")
source("functions/load_settings.R")

2.1 Load settings

All of the filtering options we’ll use here are defined in a yaml settings file, which we pointed to in setup.Rmd (see 1.3). Let’s load those:

settings <- load_settings(settings_file, radars_metadata_file)

Found 3 radars from 1 countries in the settings.

2.2 Select date range & radars

Select vp files on the date range and radars we defined in our settings (file selection):

vp_file_paths <- bioRad::retrieve_vp_paths(
  start_date = settings$general$start_date,
  end_date = settings$general$end_date,
  radar = settings$general$radar_ids_3char,
  path = raw_data_dir
)

The file paths we get back only start from the data directory, so we need to append those with the path to the data directory itself:

vp_file_paths <- paste0(raw_data_dir, vp_file_paths)

There are 75 vp files that meet our criteria. Preview:

head(vp_file_paths, 5)
## [1] "../data/raw/example/seang_vp_20161003T2000Z.h5"
## [2] "../data/raw/example/seang_vp_20161003T2015Z.h5"
## [3] "../data/raw/example/seang_vp_20161003T2030Z.h5"
## [4] "../data/raw/example/seang_vp_20161003T2045Z.h5"
## [5] "../data/raw/example/seang_vp_20161003T2100Z.h5"

2.3 Read vp files

We read all those files with bioRad. This could take a while (☕ or 🛌 time):

vp_files <- bioRad::readvp.list(vp_file_paths)

2.4 Select variables

A single vp file contains a data frame with heights as rows and variables as columns (see the bird profile format specification for more information regarding the variables):

str(vp_files[[1]]$data)
## 'data.frame':    25 obs. of  16 variables:
##  $ ff       : num  NA 13.5 12.2 10.4 11.8 ...
##  $ dbz      : num  NA 0.518 1.162 -2.417 -2.637 ...
##  $ dens     : num  NA 35.6 41.3 18.1 17.2 ...
##  $ u        : num  NA -9.63 -8.46 -6.58 -6.64 ...
##  $ v        : num  NA -9.42 -8.75 -8.01 -9.76 ...
##  $ gap      : num  1 0 0 0 0 0 1 1 1 1 ...
##  $ w        : num  NA 73.14 -2.75 -28.69 7.32 ...
##  $ n_dbz    : num  0 1177 3367 5710 3791 ...
##  $ dd       : num  NA 226 224 219 214 ...
##  $ n        : num  0 213 1044 1421 940 ...
##  $ DBZH     : num  NA 6.973 3.865 -0.865 -2.637 ...
##  $ HGHT     : num  0 200 400 600 800 1000 1200 1400 1600 1800 ...
##  $ n_dbz_all: num  0 14686 12063 9070 3791 ...
##  $ eta      : num  NA 391 454 199 189 ...
##  $ sd_vvp   : num  NA 5.06 4.99 4.64 4.52 ...
##  $ n_all    : num  0 4971 3973 2520 940 ...

We only need a couple of variables (HGHT, u, v, dens, dd, ff, DBZH) and attributes (date_time and radar_id) from the vp_files (column selection). We retrieve those with the custom function vp_to_df() after which we combine all data in one single data frame, sorted on radar_id, date_time, and HGHT:

vp_data <- list()
variables = c("u", "v", "dens", "dd", "ff", "DBZH") # HGHT is added by default in vp_to_df()
for (i in seq_along(vp_files)) {
  vp_data[[i]] <- vp_to_df(vp_files[[i]], variables)
  # radar/datetime will be extracted from a vp attribute.
  # If you want to extract it from the name of the file instead, use:
  # vp_data[[i]] <- vp_to_df(vp_files[[i]], variables, basename(vp_file_paths[[i]]))
}

bind_rows(vp_data) %>%
arrange(radar_id, datetime, HGHT) -> vp_data

To uniquely reference records later, we add an id column:

vp_data <- cbind("id" = seq.int(nrow(vp_data)), vp_data)

That data frame contains 1740 records.

2.5 Calculate MTR per height

We also want migration traffic rate (mtr) per height for further analysis, so we calculate and add it as an extra column:

# Calculate mtr: times 3.6 to convert ff from m/s to km/h, divide by 5 because 200m bins
vp_data$mtr <- vp_data$ff * 3.6 * vp_data$dens / 5

# Add our new mtr to variables
variables <- c(variables, "mtr")

Preview

head(vp_data, 20)
##    id radar_id            datetime HGHT         u          v     dens
## 1   1    seang 2016-10-03 20:00:00    0        NA         NA       NA
## 2   2    seang 2016-10-03 20:00:00  200 -9.632832  -9.421702 35.59053
## 3   3    seang 2016-10-03 20:00:00  400 -8.461532  -8.746634 41.27831
## 4   4    seang 2016-10-03 20:00:00  600 -6.581830  -8.013961 18.10636
## 5   5    seang 2016-10-03 20:00:00  800 -6.643213  -9.764238 17.21185
## 6   6    seang 2016-10-03 20:00:00 1000 -6.028350 -10.171729 11.11173
## 7   7    seang 2016-10-03 20:00:00 1200        NA         NA       NA
## 8   8    seang 2016-10-03 20:00:00 1400        NA         NA       NA
## 9   9    seang 2016-10-03 20:00:00 1600        NA         NA       NA
## 10 10    seang 2016-10-03 20:00:00 1800        NA         NA       NA
## 11 11    seang 2016-10-03 20:00:00 2000        NA         NA       NA
## 12 12    seang 2016-10-03 20:00:00 2200        NA         NA       NA
## 13 13    seang 2016-10-03 20:00:00 2400        NA         NA       NA
## 14 14    seang 2016-10-03 20:00:00 2600        NA         NA       NA
## 15 15    seang 2016-10-03 20:00:00 2800        NA         NA       NA
## 16 16    seang 2016-10-03 20:00:00 3000        NA         NA       NA
## 17 17    seang 2016-10-03 20:00:00 3200        NA         NA       NA
## 18 18    seang 2016-10-03 20:00:00 3400        NA         NA       NA
## 19 19    seang 2016-10-03 20:00:00 3600        NA         NA       NA
## 20 20    seang 2016-10-03 20:00:00 3800        NA         NA       NA
##          dd       ff       DBZH       mtr
## 1        NA       NA         NA        NA
## 2  225.6348 13.47442  6.9733987 345.28437
## 3  224.0508 12.16968  3.8646252 361.68759
## 4  219.3961 10.37035 -0.8646818 135.19384
## 5  214.2299 11.80985 -2.6370287 146.35399
## 6  210.6534 11.82392 -4.5374875  94.59664
## 7        NA       NA         NA        NA
## 8        NA       NA         NA        NA
## 9        NA       NA         NA        NA
## 10       NA       NA         NA        NA
## 11       NA       NA         NA        NA
## 12       NA       NA         NA        NA
## 13       NA       NA         NA        NA
## 14       NA       NA         NA        NA
## 15       NA       NA         NA        NA
## 16       NA       NA         NA        NA
## 17       NA       NA         NA        NA
## 18       NA       NA         NA        NA
## 19       NA       NA         NA        NA
## 20       NA       NA         NA        NA

2.6 Add day/night information

To add day/night information, we first create a sunrise/sunset dataframe for each radar/date combination, using the latitude/longitude from our radar metadata and the suntime() function in bioRad:

# Create a simple dataframe from the settings with radar, latitude, longitude
bind_rows(lapply(settings$radars, data.frame)) %>%
select(radar_id, latitude, longitude) -> radars_lat_long

# Group vp_data by radar and date
vp_data %>%
mutate(date = as.Date(datetime)) %>%
group_by(radar_id, date) %>%
summarize() %>%

# Join with radar_lat_long to get the latitude and longitude column
left_join(radars_lat_long, by = "radar_id") %>%

# Use bioRad to add sunrise/sunset information
mutate(sunrise = bioRad::suntime(date = date, lat = latitude, lon = longitude, rise = TRUE)) %>%
mutate(sunset = bioRad::suntime(date = date, lat = latitude, lon = longitude, rise = FALSE)) -> radars_dates_sunriset

Preview

head(radars_dates_sunriset, 5)
## # A tibble: 5 x 6
## # Groups:   radar_id [3]
##   radar_id date       latitude longitude sunrise            
##   <chr>    <date>        <dbl>     <dbl> <dttm>             
## 1 seang    2016-10-03     56.4      12.9 2016-10-03 05:21:06
## 2 seang    2016-10-04     56.4      12.9 2016-10-04 05:23:08
## 3 searl    2016-10-03     59.7      17.9 2016-10-03 05:04:02
## 4 searl    2016-10-04     59.7      17.9 2016-10-04 05:06:23
## 5 sease    2016-10-03     57.3      18.4 2016-10-03 04:59:47
## # ... with 1 more variable: sunset <dttm>

We then combine this sunrise/sunset information back with our vp data, to figure out if it is day or night and to which date they belong (date_of_sunset). For the latter, we consider night timestamps between midnight and sunrise as belonging to the previous date:

vp_data %>%
# A date (not datetime) column to match with radars_sunriset
mutate(date = as.Date(datetime)) %>%

# Combine vp data with radar_sunriset information
right_join(radars_dates_sunriset, by = c("radar_id" = "radar_id", "date" = "date")) %>%
  
# Define nights as starting before sunrise and after sunset
mutate(day_night = if_else(
  datetime < sunrise | datetime > sunset,
  "night",
  "day"
)) %>%
  
# Calculate the date of sunrise:
# For days: keep the date of the datetime
# For nights: rewind the time with 12 hours, so that night timestamps between
#             midnight and sunrise are considered belonging to previous day.
mutate(date_of_sunset = if_else(
  day_night == "night",
  datetime - hours(12),
  datetime
)) %>%
mutate(date_of_sunset = format(date_of_sunset, format = "%Y%m%d")) %>%

# Remove unneeded columns
select(-date, -latitude, -longitude, -sunrise, -sunset) -> vp_data

Preview:

head(vp_data, 20)
##    id radar_id            datetime HGHT         u          v     dens
## 1   1    seang 2016-10-03 20:00:00    0        NA         NA       NA
## 2   2    seang 2016-10-03 20:00:00  200 -9.632832  -9.421702 35.59053
## 3   3    seang 2016-10-03 20:00:00  400 -8.461532  -8.746634 41.27831
## 4   4    seang 2016-10-03 20:00:00  600 -6.581830  -8.013961 18.10636
## 5   5    seang 2016-10-03 20:00:00  800 -6.643213  -9.764238 17.21185
## 6   6    seang 2016-10-03 20:00:00 1000 -6.028350 -10.171729 11.11173
## 7   7    seang 2016-10-03 20:00:00 1200        NA         NA       NA
## 8   8    seang 2016-10-03 20:00:00 1400        NA         NA       NA
## 9   9    seang 2016-10-03 20:00:00 1600        NA         NA       NA
## 10 10    seang 2016-10-03 20:00:00 1800        NA         NA       NA
## 11 11    seang 2016-10-03 20:00:00 2000        NA         NA       NA
## 12 12    seang 2016-10-03 20:00:00 2200        NA         NA       NA
## 13 13    seang 2016-10-03 20:00:00 2400        NA         NA       NA
## 14 14    seang 2016-10-03 20:00:00 2600        NA         NA       NA
## 15 15    seang 2016-10-03 20:00:00 2800        NA         NA       NA
## 16 16    seang 2016-10-03 20:00:00 3000        NA         NA       NA
## 17 17    seang 2016-10-03 20:00:00 3200        NA         NA       NA
## 18 18    seang 2016-10-03 20:00:00 3400        NA         NA       NA
## 19 19    seang 2016-10-03 20:00:00 3600        NA         NA       NA
## 20 20    seang 2016-10-03 20:00:00 3800        NA         NA       NA
##          dd       ff       DBZH       mtr day_night date_of_sunset
## 1        NA       NA         NA        NA     night       20161003
## 2  225.6348 13.47442  6.9733987 345.28437     night       20161003
## 3  224.0508 12.16968  3.8646252 361.68759     night       20161003
## 4  219.3961 10.37035 -0.8646818 135.19384     night       20161003
## 5  214.2299 11.80985 -2.6370287 146.35399     night       20161003
## 6  210.6534 11.82392 -4.5374875  94.59664     night       20161003
## 7        NA       NA         NA        NA     night       20161003
## 8        NA       NA         NA        NA     night       20161003
## 9        NA       NA         NA        NA     night       20161003
## 10       NA       NA         NA        NA     night       20161003
## 11       NA       NA         NA        NA     night       20161003
## 12       NA       NA         NA        NA     night       20161003
## 13       NA       NA         NA        NA     night       20161003
## 14       NA       NA         NA        NA     night       20161003
## 15       NA       NA         NA        NA     night       20161003
## 16       NA       NA         NA        NA     night       20161003
## 17       NA       NA         NA        NA     night       20161003
## 18       NA       NA         NA        NA     night       20161003
## 19       NA       NA         NA        NA     night       20161003
## 20       NA       NA         NA        NA     night       20161003

2.7 Filter out heights

For each radar we only want to select the data above 200m above ground level (AGL) (row selection). Since the heights in the data are expressed in above sea level (ASL), the height range to exclude is the altitude of the radar + 100m, which differs from radar to radar. Those height ranges are defined in our settings (see 1.3).

To do this, we take the include_heights range for each radar2 and set the data outside that range to NA.

for (radar in settings$radars) { # For each radar
  # Create height subset for that radar
  subset = 
    vp_data %>%
    filter(
      radar_id == radar$radar_id & # Specific radar
      (HGHT < radar$min_height |   # Below min height
       HGHT > radar$max_height)    # Above max height
    ) %>%
    # Set variables to NA
    mutate(u = NA, v = NA, dens = NA, dd = NA, ff = NA, mtr = NA, DBZH = NA) %>%
    # Mention that the row was excluded because of height
    mutate(exclusion_reason = "height")
  
  # Insert subset back into vp_data
  insert(vp_data, subset, by = "id") -> vp_data
}

Preview:

head(vp_data, 20)
##    id radar_id            datetime HGHT         u          v     dens
## 1   1    seang 2016-10-03 20:00:00    0        NA         NA       NA
## 2   2    seang 2016-10-03 20:00:00  200        NA         NA       NA
## 3   3    seang 2016-10-03 20:00:00  400 -8.461532  -8.746634 41.27831
## 4   4    seang 2016-10-03 20:00:00  600 -6.581830  -8.013961 18.10636
## 5   5    seang 2016-10-03 20:00:00  800 -6.643213  -9.764238 17.21185
## 6   6    seang 2016-10-03 20:00:00 1000 -6.028350 -10.171729 11.11173
## 7   7    seang 2016-10-03 20:00:00 1200        NA         NA       NA
## 8   8    seang 2016-10-03 20:00:00 1400        NA         NA       NA
## 9   9    seang 2016-10-03 20:00:00 1600        NA         NA       NA
## 10 10    seang 2016-10-03 20:00:00 1800        NA         NA       NA
## 11 11    seang 2016-10-03 20:00:00 2000        NA         NA       NA
## 12 12    seang 2016-10-03 20:00:00 2200        NA         NA       NA
## 13 13    seang 2016-10-03 20:00:00 2400        NA         NA       NA
## 14 14    seang 2016-10-03 20:00:00 2600        NA         NA       NA
## 15 15    seang 2016-10-03 20:00:00 2800        NA         NA       NA
## 16 16    seang 2016-10-03 20:00:00 3000        NA         NA       NA
## 17 17    seang 2016-10-03 20:00:00 3200        NA         NA       NA
## 18 18    seang 2016-10-03 20:00:00 3400        NA         NA       NA
## 19 19    seang 2016-10-03 20:00:00 3600        NA         NA       NA
## 20 20    seang 2016-10-03 20:00:00 3800        NA         NA       NA
##          dd       ff       DBZH       mtr day_night date_of_sunset
## 1        NA       NA         NA        NA     night       20161003
## 2        NA       NA         NA        NA     night       20161003
## 3  224.0508 12.16968  3.8646252 361.68759     night       20161003
## 4  219.3961 10.37035 -0.8646818 135.19384     night       20161003
## 5  214.2299 11.80985 -2.6370287 146.35399     night       20161003
## 6  210.6534 11.82392 -4.5374875  94.59664     night       20161003
## 7        NA       NA         NA        NA     night       20161003
## 8        NA       NA         NA        NA     night       20161003
## 9        NA       NA         NA        NA     night       20161003
## 10       NA       NA         NA        NA     night       20161003
## 11       NA       NA         NA        NA     night       20161003
## 12       NA       NA         NA        NA     night       20161003
## 13       NA       NA         NA        NA     night       20161003
## 14       NA       NA         NA        NA     night       20161003
## 15       NA       NA         NA        NA     night       20161003
## 16       NA       NA         NA        NA     night       20161003
## 17       NA       NA         NA        NA     night       20161003
## 18       NA       NA         NA        NA     night       20161003
## 19       NA       NA         NA        NA     night       20161003
## 20       NA       NA         NA        NA     night       20161003
##    exclusion_reason
## 1            height
## 2            height
## 3              <NA>
## 4              <NA>
## 5              <NA>
## 6              <NA>
## 7              <NA>
## 8              <NA>
## 9              <NA>
## 10             <NA>
## 11             <NA>
## 12           height
## 13           height
## 14           height
## 15           height
## 16           height
## 17           height
## 18           height
## 19           height
## 20           height

2.8 Filter out datetimes

For some radars certain datetime ranges need to be excluded due to (remaining) rain clutter (row selection). Those datetime ranges were discovered during manual inspection and are defined in our settings (see 1.3).

To do this, we take the exclude_datetimes range for each radar and set the data inside that range to NA.

vp_data_temp <- vp_data
for (radar in settings$radars) { # For each radar
  for (i in seq_along(radar$exclude_datetime_from)) { # For each exclude_datetime
    # Create datetime subset for that radar
    subset = 
      vp_data_temp %>%
      filter(
        radar_id == radar$radar_id & # Specific radar
        between(
          datetime,
          radar$exclude_datetime_from[[i]], # From this datetime
          radar$exclude_datetime_to[[i]]    # Until this datetime
        )
      ) %>%
      # Set variables to NA
      mutate(u = NA, v = NA, dens = NA, dd = NA, ff = NA, mtr = NA) %>%
      # Mention that the row was excluded because of datetime
      # This will overwrite any previous "height" reason
      mutate(exclusion_reason = "datetime")
      
    # Insert subset back into vp_data
    insert(vp_data, subset, by = "id") -> vp_data
  }
}

Preview:

head(vp_data, 20)
##    id radar_id            datetime HGHT  u  v dens dd ff       DBZH mtr
## 1   1    seang 2016-10-03 20:00:00    0 NA NA   NA NA NA         NA  NA
## 2   2    seang 2016-10-03 20:00:00  200 NA NA   NA NA NA         NA  NA
## 3   3    seang 2016-10-03 20:00:00  400 NA NA   NA NA NA  3.8646252  NA
## 4   4    seang 2016-10-03 20:00:00  600 NA NA   NA NA NA -0.8646818  NA
## 5   5    seang 2016-10-03 20:00:00  800 NA NA   NA NA NA -2.6370287  NA
## 6   6    seang 2016-10-03 20:00:00 1000 NA NA   NA NA NA -4.5374875  NA
## 7   7    seang 2016-10-03 20:00:00 1200 NA NA   NA NA NA         NA  NA
## 8   8    seang 2016-10-03 20:00:00 1400 NA NA   NA NA NA         NA  NA
## 9   9    seang 2016-10-03 20:00:00 1600 NA NA   NA NA NA         NA  NA
## 10 10    seang 2016-10-03 20:00:00 1800 NA NA   NA NA NA         NA  NA
## 11 11    seang 2016-10-03 20:00:00 2000 NA NA   NA NA NA         NA  NA
## 12 12    seang 2016-10-03 20:00:00 2200 NA NA   NA NA NA         NA  NA
## 13 13    seang 2016-10-03 20:00:00 2400 NA NA   NA NA NA         NA  NA
## 14 14    seang 2016-10-03 20:00:00 2600 NA NA   NA NA NA         NA  NA
## 15 15    seang 2016-10-03 20:00:00 2800 NA NA   NA NA NA         NA  NA
## 16 16    seang 2016-10-03 20:00:00 3000 NA NA   NA NA NA         NA  NA
## 17 17    seang 2016-10-03 20:00:00 3200 NA NA   NA NA NA         NA  NA
## 18 18    seang 2016-10-03 20:00:00 3400 NA NA   NA NA NA         NA  NA
## 19 19    seang 2016-10-03 20:00:00 3600 NA NA   NA NA NA         NA  NA
## 20 20    seang 2016-10-03 20:00:00 3800 NA NA   NA NA NA         NA  NA
##    day_night date_of_sunset exclusion_reason
## 1      night       20161003         datetime
## 2      night       20161003         datetime
## 3      night       20161003         datetime
## 4      night       20161003         datetime
## 5      night       20161003         datetime
## 6      night       20161003         datetime
## 7      night       20161003         datetime
## 8      night       20161003         datetime
## 9      night       20161003         datetime
## 10     night       20161003         datetime
## 11     night       20161003         datetime
## 12     night       20161003         datetime
## 13     night       20161003         datetime
## 14     night       20161003         datetime
## 15     night       20161003         datetime
## 16     night       20161003         datetime
## 17     night       20161003         datetime
## 18     night       20161003         datetime
## 19     night       20161003         datetime
## 20     night       20161003         datetime

2.9 Export to a CSV file

Finally, we export the data to a CSV file3:

# Remove id column
vp_data %>% select(-id) -> vp_data

# Write data to file (filename is dynamically created by load_settings)
write.csv(vp_data, file = paste0(processed_data_dir, project_name, "_", settings$general$vp_output_file), na = "", row.names = FALSE)

  1. For a radar the include_heights range can contain a min_height, max_height, both or none for a radar. If the value is something other than an integer, the general min_height or max_height is used instead.

  2. The data frame vp_data contains NaN and NA values, which have a different meaning. That difference gets lost in the CSV file: all are treated as blank (= NA) values, which is fine for visualizations. If you want to keep that difference, you need work further with vp_data.