Data requests
data-requests.Rmd
epivaultr
is an R package used to extract data from EpiVault, the Born in Bradford
data warehouse. It can be used for quick grabs of specific variables
from specific tables. It can also be used to manage an entire data
request end to end, from reading a user’s requested variables, to
writing output files ready for shipping.
In this guide we will run a complete data request. This involves the following steps:
- Read the requested variables
- Connect to EpiVault
- Fetch the requested data from EpiVault
- Disconnect from EpiVault
- Write the output data files
1 Read the requested variables
A data request starts from a list of variables, provided in a
csv
or xlsx
file in one of a variety of formats.
The first task is to read the variables file into an ev_variables
container, which contains all the information epivaultr
needs to fetch the required data.
In the example we create an ev_variables
container called vars
that will hold all the information
about the requested variables.
library(epivaultr)
# create an `ev_variables` container called `vars` from the file `variables.csv`
vars <- read_ev_variables("tests/example_inputs/variables.csv")
2 Connect to EpiVault
An EpiVault instance is a SQL Server database. To make a connection
you specify the server name and database name of the instance you’re
connecting to. These can be set once using options()
or can
be passed directly each time to the ev_connect()
function.
# use `options` to define the connection
options(ev_server = "BHTS-RESRCH22DV")
options(ev_database = "ResearchWarehouse")
# the connection token is returned in `con` to be used in data request functions
con <- ev_connect()
3 Fetch the requested data
So now we have:
vars |
An ev_variables
container with the details of the data request |
con |
An EpiVault connection token |
We’re ready to pass these to the next step to fetch the required
data. The requested data tables and associated metadata are bundled into
an ev_data
container.
# create an `ev_data` container called `dat` that will hold all of the data returned
dat <- fetch_ev_data(con, vars)
3.1 Missing variables and things to check
If any of the requested variables are not found in the EpiVault instance, R will return a warning message listing the variables concerned. You should check these and correct any variables with misspelt names or incorrect project/table references.
The fetch_ev_data()
function also takes a
visibility
parameter. Most data requests will be run at
visibility level 0, which is the default, so this parameter doesn’t
usually need to be specified. However, variable visibility can also lead
to missing variable warnings: if a variable has a higher visibility
level set in EpiVault than the level requested, it will be reported as
not found. So, a missing variable warning can also be caused by a
variable requiring a higher visibility privilege than your current
setting. See Variable
visibility for more information.
4 Disconnect from EpiVault
Now we no longer need the connection to EpiVault, it is good practice to disconnect.
ev_disconnect(con)
5 Write the output data files
For convenience in processing data requests, you can use
write_ev_data()
to output all data and metadata files in a
chosen file format.
To skip metadata outputs, set metadata
to
FALSE
.
# output all data and metadata tables in Stata format using the file prefix `my_data`
write_ev_data(dat,
path = "H:/DataRequests/output",
name = "my_data",
format = "stata",
metadata = TRUE)
If more fine-grained control of outputs is needed, the data and
metadata tables in dat
can be accessed using various get_
functions.