Using R to Compute Weighted Average Inferences
How to Compute Weighted Average Inferences
Next, identify and save the names of taxa that are found in both data sets.
# Compare taxa names in tolerance value and assessment data.
# Make sure all taxa names are in capital letters only
names.tv <- toupper(names(site.species)[-1])
names.assess <- toupper(names(site.species.or)[-1])
# Combine taxa names from both datasets in one vector
# and then find taxanames that are repeated
names.all <- c(names.tv, names.assess)
names.match <- names.all[duplicated(names.all)]
print("Taxa in both databases")
print(sort(names.match))
To apply assessment tools, we need to compute central tendencies for as many taxa as possible. To do this, expand the list of taxa to include all taxa that occur in at least 20 sites in the EMAP-West data set. (The 20 site limit is imposed to avoid overfitting a model to a rare taxon.)
# Get names of all taxa in the data set
taxa.names.init <- names(site.species)[-1]
# Compute the number of occurrence of each taxon
getocc <- function(x) sum(x>0)
numocc <- apply(site.species[, taxa.names.init], 2, getocc)
taxa.names <- taxa.names.init[numocc >= 20]
Now, recompute central tendencies for the expanded list of taxa by running the central tendencies script again(see Central Tendencies in the Helpful Links box). Make sure you run the script for all taxon names identified above. Depending on the number of taxa selected, this may take some time.
Continuous tolerance values (e.g., weighted averages) can be classified into tolerance categories, but it is preferable to use them in conjunction with a mean tolerance value metric.
Mean tolerance values are the best metric to use in conjunction with continuous-valued tolerance values such as weighted averages or optima. The following script assumes that weighted averages have been computed for all taxa listed in names.match. Other tolerance values can be substituted into the third line of code as desired.
# Only select taxa for which tolerance values
# have been computed.
mat1 <- as.matrix(dfmerge.or[, names.match])
# First get total abundance
tot.abn <- apply(mat1, 1, sum)
# Use matrix multiplication to compute the sum of all
# observed tolerance values, and then divide by total
# abundance to get the mean tolerance value.
mean.tv <- (mat1 %*% WA[names.match])/tot.abn
plot(dfmerge.or$temp, mean.tv, xlab = "Temperature",
ylab = "Mean tolerance value")