Practical Ecological data analysis with R [part-1]

Hafez Ahmad
6 min readMay 14, 2020

--

Ecology is the study of organisms, the environment, and how the organisms interact with each other and their environment. To find out the insights of an environment/ study area, Valid Data analysis of this site is so important. There are so many analytic techniques for analysis of ecological data such as Ordination. Principal Coordinates Analysis, NMDS, Correspondence Analysis, CCA, geographical distributions (spatial data), Spatial autocorrelation, semivariograms, gridding, point pattern analysis, directional analysis, Trend, autocorrelation, spectral analysis, wavelets…
Statistical tests relevant to ecological data analysis: ANOSIM, PerMANOVA, Mantel test, Correlation, regression, and curve fitting. Multivariate tests (MANOVA), GLM, GLS, (PCA), Two-sample tests, multiple-sample tests, and so on. there are many commercial softwares for ecological data analysis. but I will use open source R for this purpose. I will cover them gradually and wait and set up a notification.

if you wish to broaden your understanding of data analysis and learn to apply it with confidence in ecological research and monitoring, I recommend follow this and explore on the internet.

I have a dataset of approximately 75 marine benthic species were collected at 45 sites on nine beaches. R dataframe will show you like (45*89 ) means The dataset consists of 45 rows (sites) and 89 columns. I am going to answer the following problems set

1. Total abundance per site.

2. Species richness, defined as the number of different species per site.

3. calculation of The Shannon index

4. bar plot showing mean values with standard deviations

5. boxplot per site and an alternative plotting

6. linear regression with species richness against the tide

1. Total abundance per site.

# load dataset
beach_B<-read.table(‘beach.txt’,header = TRUE)

#take a look

head(beach_B)str(beach_B)

# I made subset of speceies data
# rest of them are environmental important paramter such salinity , temperature ..

species<-beach_B[,2:76]
# measure dimension
n<-dim(species)
n # says 45 observations and 75 columns
# make calculation abundance per site ,
# this is only for 1 site , I am ignoring na value # first row
sum(species[1,],na.rm = TRUE)
# we got 143 species in the 1st site

# lets calculate for 2nd site , its easy
sum(species[2,],na.rm = TRUE)

# it is very boring job for every site in this way

# make efficient way
# lets remember trick,if you types n[1] output 45,n[2 ] will be 75

total<-vector(length =n[1])for (i in 1:n[1]){
total[i]<-sum(species[i,],na.rm = TRUE)
}

# we made it for 45 site in two code of
# total contains sum of species

total

# note[ three sites have no species at all [o in the tatal vector]]

# need to do more short and efficiently
# here you go

SR<-rowSums(species,na.rm = TRUE)
SR

# we solved our first problem

# lets begin 2nd one , richnes per site

richness<-vector(length = n[1])
for (i in 1:n[1]){
richness[i]<-sum(species[i,]>0,na.rm = TRUE)

}

richness

# same thin

richness<-rowSums(species>0,na.rm = TRUE)
richness

# finished

# Shannon index per site

RS<-rowSums(species,na.rm = TRUE)
RS
prob<-species/RS
prob

Hindex<-rowSums(prob*log10(prob),na.rm=TRUE)
Hindex #get it per site

# we could do using vegan library

library(vegan)
Hindex<-diversity(species)
Hindex
simple barplot per beach site

4. bar plot showing mean values with standard deviations

MeanValue<-tapply(beach_B$Richness,INDEX = beach_B$Beach,FUN = mean)
MeanValue # mean per beach
#standard diavetion
SD<-tapply(beach_B$Richness,INDEX = beach_B$Beach,FUN = sd)

# combine the data

MSD<-cbind(MeanValue,SD)
head(MSD)

barplot(MeanValue,xlab = ‘beach’,ylab=’richness’,col=rainbow(9))
box()

this is what I am expecting barplot with standard deviations

arrows(bp,MeanValue,bp,MeanValue+SD,lwd=1.5,angle = 90,length = 0.1)
box()

#do we need to show standard error plot

Ble<-tapply(beach_B$Richness,INDEX = beach_B$Beach,FUN = length)
Bse<-SD/sqrt(Ble)
stripchart(beach_B$Richness ~beach_B$Beach,
vert=TRUE,pch=1,method=’jitter’,jit=0.05,xlab=’beach’,ylab=’Species richness’,col=rainbow(9),main=’standard error plot per site’)
#ble (1:9)
points(1:9,MeanValue,pch=16,cex=1.5)
arrows(1:9,MeanValue,1:9,MeanValue+Bse,lwd=1.5,angle = 90,length = 0.1)
arrows(1:9,MeanValue,1:9,MeanValue-Bse,lwd=1.5,angle = 90,length = 0.1)
boxplot
boxplot(Richness~Beach,data=beach_B,xlab=’beach’,ylab=’richness’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))

# you wanna alternative to boxplot , cleveland dotplot is good choice

beach_B$Beach<-factor(beach_B$Beach)
dotchart(beach_B$Richness,groups=beach_B$Beach,xlab=’beach’,main=’dotplot’
,ylab=’richness’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))
# make together two plots
par(mfrow=c(1,2))
dotchart(beach_B$Richness,groups=beach_B$Beach,xlab=’beach’,main=’dotplot’
,ylab=’richness’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))
dotchart(beach_B$Richness,groups=beach_B$Beach,gdata=MeanValue,gpch=19,xlab=’beach’,main=’dotplot with mean value per site’
,ylab=’richness’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))
#par(bty=”n”)
legend(‘bottomright’,c(‘Values’,’Mean’),pch=c(1,19),bg=”transparent”)

6. linear regression with species richness against the tide

# there is column named Nap mean mean high tide
plot(beach_B$Richness,beach_B$NAP,xlab=’Mean high tide(m)’,ylab=’Species Richness’,
main=’Linear regression ploting’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))
box()
modellm<-lm(Richness~NAP,data = beach_B)
summary(modellm)
abline(modellm)
there is no significant relationship

Call:
lm(formula = Richness ~ NAP, data = beach_B)

Residuals:
Min 1Q Median 3Q Max
-5.0675 -2.7607 -0.8029 1.3534 13.8723

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.6857 0.6578 10.164 5.25e-13 ***
NAP -2.8669 0.6307 -4.545 4.42e-05 ***
— -
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.16 on 43 degrees of freedom
Multiple R-squared: 0.3245, Adjusted R-squared: 0.3088
F-statistic: 20.66 on 1 and 43 DF, p-value: 4.418e-05

plot(y=beach_B$Richness,x=beach_B$NAP,xlim=c(-3,3),ylim=c(0,10),
xlab=’Mean high tide(m)’,ylab=’Species Richness’,
main=’Linear regression ploting[closer view]’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))
box()
modellm<-lm(Richness~NAP,data = beach_B)
abline(modellm)

Extra plotting of our dataset and introduction to the layout setting. this is cool plotting

# do you wanna draw combined plots [point +box]


layout<-matrix(c(2,0,1,3),nrow=2,ncol=2,byrow = TRUE)
layout
demoplot<-layout(mat=layout,widths=c(3,1),heights=c(1,3),respect=TRUE)
layout.show(demoplot)
minmaxx<-c(min(beach_B$NAP),max(beach_B$NAP))
minmaxy<-c(min(beach_B$Richness),max(beach_B$Richness))
par(mar=c(4,4,2,2))
plot(y=beach_B$Richness,x=beach_B$NAP,xlim=c(-3,3),ylim=c(0,10),
xlab=’Mean high tide(m)’,ylab=’Species Richness’,
main=’Species richness vs mean high tide’,col=c(“#458B00”, “#D2691E”, “#FF4040”, “#8B2323”, “#76EEC6”, “#B8860B”, “#BF3EFF”, “#8B0000”, “#556B2F”))
par(mar=c(0,3,1,1))
boxplot(beach_B$NAP,horizontal = TRUE, axes = FALSE,main=’boxplot of Mean high tide’,col=’red’,
frame.plot = FALSE, ylim = minmaxx, space = 0)
par(mar=c(3,0,1,1))
boxplot(beach_B$Richness,axes = FALSE,main=’Species richness’,col=’Green’,
ylim = minmaxy, space = 0, horiz = TRUE)

I will discuss more plotting and analysis of ecological data in the next part. I have attached the R [version 4.0.0] code in the GitHub repository. Please check it. The next post will be [2. Cluster and principal component analysis

2. Analysis of biodiversity data

3. The cross-sectional plot of depth, salinity

4. Temperature extraction [in CSV format], time series generation and visualization from NetCDF file]

If you have any questions, please feel free to ask. And if you can improve the code for me, please do. I really appreciate it. Thank you.

--

--

Hafez Ahmad
Hafez Ahmad

Written by Hafez Ahmad

I am an Oceanographer. I am using Python, R, MATLAB, Julia, and ArcGIS for data analysis, data visualization, Machine learning, and Ocean/ climate modeling.

Responses (1)