Getting started
getting-started.Rmd
epivaultr
is an R package used to extract data from EpiVault, the Born in Bradford
data warehouse. It can be used for quick grabs of specific variables
from specific tables. It can also be used to manage an entire data
request end to end, from reading a user’s requested variables, to
writing output files ready for shipping.
In this guide we will install epivaultr
,
connect to EpiVault, and do a quick extract of a couple of variables
from a single table.
Installation
epivaultr
can be installed from GitHub using the devtools
package:
devtools::install_github("BornInBradford/epivaultr")
If a firewall is blocking the download of the source tarball, you may be able to download and install a source zip manually as follows:
- Navigate to the
epivaultr
GitHub page - Down the right-hand side you’ll see the Releases listed. Click on the Release you wish to install
- Under Assets click
Source code (zip)
to download the source zip and save it locally - Run the following command, substituting the local path to the source zip:
devtools::install_local("path/to/source/zip")
A quick extract
The script below opens an EpiVault connection, extracts a couple of variables, and disconnects from EpiVault.
library(epivaultr)
con <- ev_connect(ev_server = "BHTS-RESRCH22DV", ev_database = "ResearchWarehouse")
dat <- ev_simple_fetch(con,
project = "BiB_CohortInfo",
table = "ethnicity",
variables = c("participant_type", "ethsource"))
ev_disconnect(con)
Bigger data extracts
The ev_simple_fetch()
function provides a convenient way
to quickly grab a small number of variables. If, however, your project
needs a large number of variables from across many tables, you may find
it easier to maintain a variables list in a separate file and read the
data in using this.
The script below reads a variable request specification from a
spreadsheet and extracts the data into a single container called an ev_data
container.
# read the requested variables into an `ev_variables` container called `vars`
vars <- read_ev_variables("tests/example_inputs/variables.xlsx")
# extract the data and metadata tables into an `ev_data` container called `dat`
dat <- fetch_ev_data(con, vars)
Retrieving data from a bigger extract
The data in dat
can be retrieved using the get_
functions.
Get the names of the data tables
get_ev_data_names(dat)
Get a data table
dat_table <- get_ev_data(dat, df_name = "proj1.tab1")
Get metadata
Various types of metadata can also be accessed from the ev_data
container.
# get variable metadata
meta1 <- get_ev_metadata(dat, type = "variable")
# get category metadata
meta2 <- get_ev_metadata(dat, type = "category")
# get table metadata
meta3 <- get_ev_metadata(dat, type = "table")
For more information on retrieving data and metadata from an ev_data
container, see the Reference
section.
For a detailed walk-through of how to run a complete data request see the Data requests article.