Thursday, June 20, 2013

Generating nice-looking RDS networks,

I often see recruitment networks from RDS studies laid out like this:

However, this sort of layout makes it hard to see who are 'seeds', how many waves there are, etc.. I prefer a layered Sugiyama type format like this:

Here's how you can make such a network using R. You can download the data, sim.rdsat, from here.

In order to read RDSAT data into R, we skip over the first couple of rows.

rdsdata <- read.table("sim.rdsat", skip = 2, row.names = NULL)

This misses out on other information, that we can get from the first couple of lines. Our data is tab-delimited, so we split the second line into separate fields based on the tab character.

# Read the second line
rdsdata.headers <- readLines("sim.rdsat", n = 2)[2]
# Split the line into fields
rdsdata.headers <- unlist(strsplit(rdsdata.headers, "\t")[[1]])
# Extract the number of coupons
ncoupons <- as.integer(rdsdata.headers[2])
# Extract the missing code
misscode <- as.integer(rdsdata.headers[3])
# Now generate new headers
newheaders <- c("id", "netsize", "mycoupon")
for (i in 1:ncoupons) {
    newheaders <- c(newheaders, paste("coupon", i, sep = "."))
}
newheaders <- c(newheaders, rdsdata.headers[(4 + ncoupons):length(rdsdata.headers)])
names(rdsdata) <- newheaders

Now we have a table with standardized headers for identifier, network size, and coupons.

We have to convert the RDSAT data, such that we find the recruiter for each individual. First we make a list of coupons, and who they were given to. Note that I turn everything into a character string.

couponList <- list()
couponsGiven <- rdsdata[, grep("coupon.", names(rdsdata))]
# Now collapse coupons into a single column
couponsGiven <- stack(couponsGiven)
couponIDs <- rep(rdsdata$id, ncoupons)
# Cycle through each coupon
for (i in 1:dim(couponsGiven)[[1]]) {
    thiscoupon <- as.character(couponsGiven$values[i])
    if (thiscoupon != as.character(misscode)) {
        couponList[[thiscoupon]] <- as.character(couponIDs[i])
    }
}

Now we can find the recruiter for each individual.

recruiters <- rep("", dim(rdsdata)[[1]])
for (i in 1:length(recruiters)) {
    thiscoupon <- as.character(rdsdata$mycoupon[i])
    if (thiscoupon == as.character(misscode)) {
        recruiters[i] <- NA
    } else {
        recruiters[i] <- couponList[[thiscoupon]]
    }
}

To plot out the network in a nice Sugiyama-style format, I use Rgraphviz. To install this, set the repositories to include BioConductor; you'll also have to have Graphviz installed.

setRepositories()

To get RGraphviz to render the network, I generate an adjacency matrix, then convert to a graph object.

library(graph)
allids <- sort(unique(c(as.character(rdsdata$id), recruiters)))
numids <- length(allids)
adj <- matrix(0, numids, numids)
dimnames(adj) <- list(allids, allids)
for (j in 1:dim(rdsdata)[1]) {
    adj[match(recruiters[j], allids), match(rdsdata$id[j], allids)] <- 1
}
rdsdata.graph <- graphAM(adjMat = adj, edgemode = "directed")

The resulting network can be plotted very easily.

library(Rgraphviz)
plot(rdsdata.graph, "dot")

However, we can improve this plot by adding in covariate information. I set the colour to be red if x2=1, and the width of the symbol to 2 rather than 1 if x1>0. To do this, we make a list of attributes, which we will pass to the plotting routine. A full list of these attributes can be found here.

nodeNames <- nodes(rdsdata.graph)
numNodes <- length(nodeNames)
nAttrs <- list()
nAttrs$label <- rep("", numNodes)
names(nAttrs$label) <- nodeNames
nAttrs$fillcolor <- c()
nAttrs$width <- c()
for (i in 1:numNodes) {
    if (rdsdata$x2[i] == 0) {
        nAttrs$fillcolor <- c(nAttrs$fillcolor, "blue")
    } else {
        nAttrs$fillcolor <- c(nAttrs$fillcolor, "red")
    }
    if (rdsdata$x1[i] > 0) {
        nAttrs$width <- c(nAttrs$width, 2)
    } else {
        nAttrs$width <- c(nAttrs$width, 1)
    }
}
names(nAttrs$fillcolor) <- as.character(nodeNames)
names(nAttrs$width) <- as.character(nodeNames)

Now we can pass the attributes to the plotting routine, to get the network shown at the beginning of the post.

plot(rdsdata.graph, nodeAttrs = nAttrs, "dot")

No comments:

Post a Comment