Skip to contents

epivaultr is an R package used to extract data from EpiVault, the Born in Bradford data warehouse. It can be used for quick grabs of specific variables from specific tables. It can also be used to manage an entire data request end to end, from reading a user’s requested variables, to writing output files ready for shipping.

In this guide we will run a complete data request. This involves the following steps:

  1. Read the requested variables
  2. Connect to EpiVault
  3. Fetch the requested data from EpiVault
  4. Disconnect from EpiVault
  5. Write the output data files

1 Read the requested variables

A data request starts from a list of variables, provided in a csv or xlsx file in one of a variety of formats. The first task is to read the variables file into an ev_variables container, which contains all the information epivaultr needs to fetch the required data.

In the example we create an ev_variables container called vars that will hold all the information about the requested variables.

library(epivaultr)

# create an `ev_variables` container called `vars` from the file `variables.csv`
vars <- read_ev_variables("tests/example_inputs/variables.csv")

2 Connect to EpiVault

An EpiVault instance is a SQL Server database. To make a connection you specify the server name and database name of the instance you’re connecting to. These can be set once using options() or can be passed directly each time to the ev_connect() function.

# use `options` to define the connection
options(ev_server = "BHTS-RESRCH22DV")
options(ev_database = "ResearchWarehouse")

# the connection token is returned in `con` to be used in data request functions
con <- ev_connect()

3 Fetch the requested data

So now we have:

vars An ev_variables container with the details of the data request
con An EpiVault connection token

We’re ready to pass these to the next step to fetch the required data. The requested data tables and associated metadata are bundled into an ev_data container.

# create an `ev_data` container called `dat` that will hold all of the data returned
dat <- fetch_ev_data(con, vars)

3.1 Missing variables and things to check

If any of the requested variables are not found in the EpiVault instance, R will return a warning message listing the variables concerned. You should check these and correct any variables with misspelt names or incorrect project/table references.

The fetch_ev_data() function also takes a visibility parameter. Most data requests will be run at visibility level 0, which is the default, so this parameter doesn’t usually need to be specified. However, variable visibility can also lead to missing variable warnings: if a variable has a higher visibility level set in EpiVault than the level requested, it will be reported as not found. So, a missing variable warning can also be caused by a variable requiring a higher visibility privilege than your current setting. See Variable visibility for more information.

4 Disconnect from EpiVault

Now we no longer need the connection to EpiVault, it is good practice to disconnect.

5 Write the output data files

For convenience in processing data requests, you can use write_ev_data() to output all data and metadata files in a chosen file format.

To skip metadata outputs, set metadata to FALSE.

# output all data and metadata tables in Stata format using the file prefix `my_data`
write_ev_data(dat, 
              path = "H:/DataRequests/output",
              name = "my_data",
              format = "stata",
              metadata = TRUE)

If more fine-grained control of outputs is needed, the data and metadata tables in dat can be accessed using various get_ functions.