PREFERENTIAL: Improving spatial estimation and survey design through preferential sampling in fishery and biological applications

Funding body: FCT – Fundação para a Ciência e Tecnologia 
Project reference: PTDC/MAT-STA/28243/2017
Starting date: October 1, 2018
Duration in months: 36
Principal contractor: Universidade do Minho
Principal investigator: Raquel Menezes (Centro de Biologia Molecular e Ambiental, Universidade do Minho)
Co-Principal investigator: Isabel Natário (Centro de Matemática e Aplicações)
Abstract: Improving knowledge about biodiversity and species abundance has become a scientific and societal important issue. In Portugal, the spatial distribution and abundance of the highly appreciated species black scabbardfish is mostly unknown, relying mainly on information from commercial fisheries. Since commercial fishing takes place where fisherman expect to find the species, abundance estimation becomes biased. This is a particular example of the acknowledged fact that in many real world situations, the choice of sampling locations is not random but preferentially selected in accordance to a gradient of the measured variable. Such phenomenon is coined as Preferential Sampling (PS) with the sense of stochastic dependence between locations and data processes and not just sampling preferred locations. Examples of PS data are found in biological and environmental sciences, guided by economic constraints and practical requirements.
Analysis of data obtained under PS using traditional geostatistical methods can produce misleading inferential conclusions. This problem was firstly acknowledged in the literature in a work co-authored by the PI of this project.
The authors proposed a model-based approach under restrictive Gaussian assumptions which was later considered by other authors under a Bayesian framework.
The aim of this project is to address statistical and computational issues that remain open in the analysis of data under PS with a view to ensuring that any model for a data set should respect whatever sampling design has been used to generate the data.
Specifically, this project considers the extension of the existing methodology: to include explanatory variables; to non-Gaussian settings, more adequate for positive valued variables and count data; to spatio-temporal data, since this allows not only the spatial characterization of the process under study but also forecasting and scenario drawing.
Note that statistical analysis and inference in spatio-temporal data sampled under PS may lead to inference from longitudinal studies in which data is obtained under PS (usually in time), often encountered in medical contexts. The computational issues to be addressed include methods, such as SPDE and marked point processes, to efficiently to efficiently perform the estimation.
The statistical methodology will be applied to data provided by the Portuguese Institute of Sea and Atmosphere (IPMA) to estimate the spatial abundance of black scabbardfish. Finally, survey designs oriented towards the abundance and biomass estimation along Portuguese continental slope, taking into account PS, will be defined and evaluated, thus establishing benchmarks in fisheries biology. Applications to other relevant data sets are envisaged.
The project team has experience in spatial and temporal modelling and in particular on fish abundance studies. The project is endorsed by Producers Fishing Association (AP) who are willing to cooperate closely with the team
Keywords: Preferential Sampling | Geostatistics | Marked Point Processes | Fishery, Biological and Environmental data