Objective of cleaning procedure using smoothing splines anova

Smoothing spline analysis of variance on each genotype-scenario of an experiment. Detection of outlier repetition if significant TT*Rep (thermal time by repetition) interaction using a Kullblack-Leibler projection (KL). I consider a genotype-scenario as outlier:

  • biovolume: if KL > 0.05
  • plantHeight: if KL > 0.05
  • leafArea: if KL > 0.05

The input dataset must contain the following columns:

  • experimentAlias
  • genotypeAlias
  • scenario
  • repetition
  • thermalTime (for thermal time)
  • parameter of interest (biovolume, plantHeight etc…)

The five first column names are standard names extracted from the web service.

Import of data

  library(ggplot2)
  library(lubridate)
  library(tidyr)
  library(dplyr)
  library(gss)
  library(openSilexStatR)

  myreport<-substr(now(),1,10)
  data(plant3)
  cat("-------------- plant3 dataset ---------------\n")
## -------------- plant3 dataset ---------------
  printExperiment(datain=plant3)
## Experiment: manip3 
## Genotypes: 10 
##  [1] "A3_H"     "A310_H"   "11430_H"  "A554_H"   "A374_H"   "A347_H"  
##  [7] "B100_H"   "A375_H"   "AS5707_H" "A347"    
## Scenario: 2 
## [1] "WW" "WD"
## Repetition-scenario: 6 
## [1] "1-WW" "2-WW" "3-WW" "1-WD" "2-WD" "3-WD"
## Pots (number of plants): 60 
## Line: 25 
## Position: 42
  # Import data, here is a dataset in the phisStatR package, You have to import your own dataset
  # using a read.table() statement or a request to the web service
  # You can add some datamanagement statements...
  #------------------------------------------------------------------------
  # Please, add the 'Ref' and 'Genosce' columns if don't exist. 
  # 'Ref' is the concatenation of experimentAlias-Line-Position-scenario
  # 'Genosce' is the concatenation of experimentAlias-genotypeAlias-scenario
  #------------------------------------------------------------------------

  mydata<-unite(plant3,Genosce,experimentAlias,genotypeAlias,scenario,
                sep="-",remove=FALSE)
  mydata<-arrange(mydata,Genosce)
  # For one parameter, for example biovolume
  resbio<-fitGSS(datain=mydata,trait="biovolume",loopId="Genosce")

Curves by genotype-scenario

Biovolume

  outlierbio<-printGSS(object=resbio,threshold = 0.05)
  klbio<-printGSS(object=resbio,threshold = NULL)

  cat("Detection of outlier curve with KL projection:\n")
## Detection of outlier curve with KL projection:
  print(outlierbio)
##              Genosce      ratio        kl     check
## 1  manip3-11430_H-WW 0.15015910 1175.7782 0.9999887
## 2 manip3-AS5707_H-WD 0.07205993  633.0729 0.9999874
  #------------------------------------------------
  # You can export these two datasets
  # suppress the comments
  #------------------------------------------------
  #write.table(outlierbio,paste0(myreport,"outlier_gss_biovolume.csv"),
  #   row.names = FALSE,sep="\t")
  #write.table(klbio,paste0(myreport,"KLprojection_gss_biovolume.csv"),
  #   row.names = FALSE,sep="\t")

I take a threshold of 0.05 for this example. We can take a more conservative threshold like 0.01 or 0.02 to detect more outlier curves…

  # plot of the smoothing splines by genotype-scenario  
  for(i in seq(1,length(unique(mydata[,"Genosce"])),by=12)){
    myvec<-seq(i,i+11,1)
    myvec<-myvec[myvec<=length(unique(mydata[,"Genosce"]))]
    print(plotGSS(datain=mydata,modelin=resbio[[1]],trait="biovolume",
                  myvec=myvec,lgrid=50))
    cat("\n\n")
  }

Session info

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
## [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] openSilexStatR_1.1.0 gss_2.2-2            dplyr_1.0.2         
## [4] tidyr_1.1.2          lubridate_1.7.9      ggplot2_3.3.2       
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-148       matrixStats_0.56.0 fs_1.4.2           sf_0.9-5          
##  [5] gmodels_2.18.1     RColorBrewer_1.1-2 rprojroot_1.3-2    evd_2.3-3         
##  [9] CARBayesdata_2.2   tools_4.0.2        backports_1.1.9    rgdal_1.5-16      
## [13] R6_2.4.1           KernSmooth_2.23-17 spData_0.3.8       DBI_1.1.0         
## [17] colorspace_1.4-1   raster_3.3-13      withr_2.2.0        CARBayesST_3.1    
## [21] sp_1.4-2           tidyselect_1.1.0   gridExtra_2.3      GGally_2.0.0      
## [25] leaflet_2.0.3      compiler_4.0.2     expm_0.999-5       desc_1.2.0        
## [29] labeling_0.3       scales_1.1.1       classInt_0.4-3     pkgdown_1.5.1     
## [33] stringr_1.4.0      digest_0.6.25      foreign_0.8-80     rmarkdown_2.3     
## [37] pkgconfig_2.0.3    htmltools_0.5.0    htmlwidgets_1.5.1  rlang_0.4.7       
## [41] rstudioapi_0.11    generics_0.0.2     farver_2.0.3       crosstalk_1.1.0.1 
## [45] gtools_3.8.2       spdep_1.1-5        magrittr_1.5       shapefiles_0.7    
## [49] dotCall64_1.0-0    Matrix_1.2-18      Rcpp_1.0.5         munsell_0.5.0     
## [53] lifecycle_0.2.0    truncdist_1.0-2    stringi_1.4.6      yaml_2.2.1        
## [57] MASS_7.3-51.6      plyr_1.8.6         matrixcalc_1.0-3   grid_4.0.2        
## [61] gdata_2.18.0       crayon_1.3.4       deldir_0.1-28      lattice_0.20-41   
## [65] splines_4.0.2      knitr_1.29         pillar_1.4.6       boot_1.3-25       
## [69] codetools_0.2-16   stats4_4.0.2       LearnBayes_2.15.1  glue_1.4.2        
## [73] evaluate_0.14      data.table_1.13.0  vctrs_0.3.4        spam_2.5-1        
## [77] testthat_2.3.2     gtable_0.3.0       purrr_0.3.4        reshape_0.8.8     
## [81] assertthat_0.2.1   xfun_0.16          SpATS_1.0-11       e1071_1.7-3       
## [85] coda_0.19-3        class_7.3-17       truncnorm_1.0-8    tibble_3.0.3      
## [89] memoise_1.1.0      units_0.6-7        ellipsis_0.3.1

References

  1. R Development Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
  2. Chong Gu (2014). Smoothing Spline ANOVA Models: R Package gss. Journal of Statistical Software, 58(5), 1-25. URL http://www.jstatsoft.org/v58/i05/.