University of Trás-os-Montes e Alto Douro

(1)

University of Trás-os-Montes e Alto Douro

Deep learning techniques for grapevine variety

classification using natural images

Doctoral Thesis in Informatics

by

Carlos Manuel Silva Pereira

Supervisor: Manuel José Cabral dos Santos Reis, PhD

Cosupervisor: Raul Manuel Pereira Morais dos Santos, PhD

(2)

(3)

Deep learning techniques for grapevine variety

classification using natural images

by

Carlos Manuel Silva Pereira

Supervisor: Manuel Jos´

e Cabral dos Santos Reis, PhD

Cosupervisor: Raul Manuel Pereira Morais dos Santos, PhD

A Thesis submitted to the

UNIVERSITY OF TRA´ S-OS-MONTES E ALTO DOURO For the degree of Doctor of Philosophy

in Informatics, according to the

Regulamento Geral dos Ciclos de Estudo Conducentes ao Grau de Doutor na UTAD DR, 2.a_{s´erie – N.}o_{133 – Regulamento n.}o_{656/2016 de 13 de julho de 2016}

(4)

(5)

Deep learning techniques for grapevine variety

classification using natural images

by

Carlos Manuel Silva Pereira

Supervisor: Manuel Jos´

e Cabral dos Santos Reis, PhD

Cosupervisor: Raul Manuel Pereira Morais dos Santos, PhD

A Thesis submitted to the

UNIVERSITY OF TRA´ S-OS-MONTES E ALTO DOURO For the degree of Doctor of Philosophy

in Informatics, according to the

Regulamento Geral dos Ciclos de Estudo Conducentes ao Grau de Doutor na UTAD DR, 2.a_{s´erie – N.}o_{133 – Regulamento n.}o_{656/2016 de 13 de julho de 2016}

(6)

(7)

Scientific Coordination

Manuel Jos´e Cabral dos Santos Reis, PhD

Associate Professor with Agregação of the

Department of Engineering, of Sciences and Technology School University of Tr´as-os-Montes e Alto Douro

Raul Manuel Pereira Morais dos Santos, PhD

Associate Professor with Agregação of the

Department of Engineering, of Sciences and Technology School University of Tr´as-os-Montes e Alto Douro

(8)

“If the data is petroleum, artificial intelligence is the engine” Young Sohn

(in Web Summit 2018, Lisbon)

(9)

Deep learning techniques for grapevine variety

classification using natural images

Carlos Manuel Silva Pereira

Submitted to the University of Tra´s-os-Montes e Alto Douro In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy in Informatics

Abstract — This thesis proposes a computer vision system for automatic classification of six endogenous grapevine varieties of the Portuguese Douro Demarcated Region from natural images. For a full understanding about the applied methodologies and the developed experiments in this research work, we structured this document into six sections. The first ones are reserved for the revision of the literature about image processing in agriculture, such as, image processing techniques (enhancement and color model conversion) and image segmentation methods that inspired us to develop the proposed leaf segmentation algorithm. The theoretical background about the machine learning process, namely, deep learning and convolutional neural networks were presented for an easier understanding of the methodologies applied on our research proposal. The remaining ones are reserved for the presentation of the materials and methods, the major conclusions and possible future work developments.

Our proposed system is hard to develop because it presents many constraints. First, the presence in natural vineyard images of savage foliage, weed, multiple leaves with overlapping, occlusion, and obstruction by objects due to the shadows, dust, insects and other adverse climatic conditions that occur in natural environment at the moment of image capturing; second, high similarity of the images among different grapevine varieties; third, leaf senescence and significant changes on the grapevine leaf and bunches images in the harvest seasons, namely, due to adverse climatic conditions, diseases and presence of pesticides; fourth, the low volume of images available. In addition, the vineyards of the Douro Region are also characterized for having more than one grapevine variety per parcel and even for row. Knowing the susceptibility of a particular variety to a specific disease, its identification using this automatic system, will help, for example, in a more specific and targeted treatment. Besides that, many wine producers are entitled to this large number of grapevine varieties to produce their most expensive wines.

As the title of this thesis highlights, the deep learning techniques were used to solve the presented constratints. With this advanced neural technologies, the performance of transfer learning schemes based on AlexNet architecture was evaluated for classification of grapevine varieties using diverse pre-processed datasets.

(10)

generated with the application of some image processing methods, including a proposed four-corners-in-one image warping algorithm for deep training purposes. After detailing some network schemes, we present and discuss some of the experimental results obtained by the proposed approach, which we judge promising and encouraging to help Douro wine growers in the automatic grapevine varieties classification for future implementation of a robotic grape harvest.

Keywords: AlexNet deep model; transfer learning techniques; natural vineyard images; image analysis and processing; leaf segmentation; grapevine variety classification; precision viticulture.

(11)

Técnicas de aprendizagem profunda para classificação de

variedades de videiras usando imagens naturais

Carlos Manuel Silva Pereira

Submetido à Universidade de Tra´s-os-Montes e Alto Douro para preenchimento parcial dos requisitos para obtenção

do grau de Doutor em Informática

Abstract — Esta tese propõe um sistema de visão computacional para classificação automática de seis variedades de videira endógenas da Região Demarcada do Douro a partir de imagens naturais. Para um completo entendimento sobre as metodologias aplicadas e as experièncias desenvolvidas neste trabalho de investigação, estruturamos este documento em seis capítulos. Os primeiros são reservados à revisão da literatura sobre processamento de imagens na agricultura, como técnicas de processamento de imagens (realce da imagem e conversão de modelos de cores) e métodos de segmentação de imagens que nos inspiraram a desenvolver um algoritmo de segmentação de folhas. Os fundamentos teóricos sobre o processo de aprendizagem de máquina, a saber, aprendizagem profunda e redes neuronais convolucionais, são apresentados para facilitar a compreensão das metodologias aplicadas na nossa proposta de trabalho. Os demais capítulos ficam reservados para a apresentação dos materiais e métodos, as principais conclusões e possíveis desenvolvimentos futuros do trabalho.

Torna-se difícil desenvolver o sistema que se propõe porque apresenta muitos constrangimentos. Primeiro, a presença em imagens naturais de vinhas de folhagem selvagem, erva daninha, várias folhas com sobreposição, oclusão e obstrução por objetos devido às sombras, poeira, insetos e outras condições climáticas adversas que ocorrem no ambiente natural no momento da captação de imagem; segundo, a alta similaridade das imagens entre diferentes variedades de videira; terceiro, senescência foliar e mudanças significativas nas imagens de folhas e cachos de videira nas safras, devido a condições climáticas adversas, doenças e presença de pesticidas; quarto, o baixo volume de imagens disponíveis. Além disso, as vinhas da região do Douro também se caracterizam por possuir mais de uma variedade de videira por parcela e até por bardo. Conhecendo a suscetibilidade de uma variedade específica a uma doença específica, usando este sistema automático, a sua identificação ajudará, por exemplo, num tratamento mais específico e direcionado. Além disso, muitos produtores de vinho têm utilizado um grande número de variedades de videira para produzir os seus vinhos de referência e, portanto, mais caros.

Como o título desta tese destaca, as técnicas de aprendizagem profunda foram usadas para resolver os constrangimentos apresentados. Com estas tecnologias neuronais avançadas, o desempenho dos esquemas de aprendizagem de transferência baseados na arquitetura AlexNet foi avaliado na classificação de variedades de

(12)

videira usando diversos conjuntos de dados pré-processados.

Assim, foram construídos dois conjuntos de dados de imagem de vinhas naturais a partir dos quais foram gerados diferentes conjuntos de dados pré-processados com a aplicação de alguns métodos de processamento de imagem, incluindo um algoritmo de distorção de imagem chamado four-corners-in-one para fins de treino. Depois de detalharmos alguns esquemas de rede, apresentamos e discutimos alguns dos resultados experimentais obtidos pela abordagem proposta, que julgamos promissores e encorajadores para ajudar os viticultores do Douro na classificação automática das variedades de videira para futura implementação de um robot para colheita de uvas.

Palavras-chave: modelo AlexNet; técnicas de aprendizagem de transferência; imagens de vinhas naturais; análise e processamento de imagem; segmentação de folhas; classificação de variedades de videira; vitivinicultura de precisão.

(13)

A

cknowledgments

I would like to thank the following people and institutions for their valuable advice and help, without whom, the development and completion of this research project not have been possible.

 The first words of gratitude are directed to my supervisor, Prof. Dr. Manuel Cabral

Reis, for his scientific support and, especially, for encouraging me to develop the work

on this thesis during all these years. Thanks again and again;

 Next words go to my co-supervisor, Prof. Dr. Raul Morais dos Santos, for the technical support and valuable tips;

 A very special word to Prof. Dr. Dorabela Regina Gamboa, President of the School of Technology and Management with Polytechnic of Porto for the support, and above all, the encouragement given over these long years;

 Institutionally, my thanks go to the Magnificent Dean of the University of Trás-os-Montes e Alto Douro, Prof. Dr. António Fontainhas Fernandes, to the President of the School of Science and Technology, Prof. Dr. José Boaventura da Cunha and the Director of the Informatics Doctoral Program, Prof. Dr. João Pereira Barroso for helping the cause of this thesis in any possible way;

 Institute of Electronics and Informatics Engineering of Aveiro (IEETA) and Foundation for Science and Technology (FCT) for the support and partial funding of this research;

 Eng. Maria Serpa Pimentel, owner and agricultural technician of Quinta da Pacheca and Quinta do Vale de Abraão, Mr. Paulo Pinto Rodrigues, chief of the vineyard and Mr. Ricardo Meira, assistant director, both of Quinta do Vallado for their authorizing and supporting the vineyard image acquisition process;

 Especially to my wife, Lucília Pereira, for the support and encouragement to perform this long thesis and make me believe that I could also be able to complete it;

(14)

 My son, Miguel Pereira, for the patience to hear me talking about the progress of the project and his help in updating my portable computer with the software packages needed for the development of my thesis;

 Prof. Manuela Mendes Miranda, director of the Agrupamento de Escolas do Sudeste

(15)

T

able of contents

Abstract

ix

Resumo

xi

Acknowledgments

xiii

Figures

xix

Tables

xxi

Acronyms and abbreviations

xxiii

1 Introduction

1

1.1 Theme and context ... 1

1.2 Motivation ... 4

1.3 Goals ... 6

1.4 Constraints ... 7

1.5 Problem statement ... 9

1.6 Publication of research results ... 10

1.7 Thesis outline ... 11

2 Literature review and related work

13

2.1 Introduction ... 13

2.2 Precision agriculture ... 13

2.3 Plant species identification ... 14

2.3.1 Challenges in automatic identification ... 15

2.3.2 Plant organs for automated identification ... 17

2.4 Robotic Technology in Wine Production ... 19

2.5 Image processing in agriculture ... 19

2.5.1 Image pre-processing techniques ... 20

2.5.1.1 Enhancement ... 20

(16)

2.5.2 Image segmentation methods ... 22

2.5.2.1 Plant leaf detection ... 22

2.5.2.2 Fruit detection ... 24

2.5.3 Image analysis ... 27

2.5.3.1 Fruit identification ... 28

2.5.3.2 Plant disease classification ... 31

2.5.3.3 Appearance index measurements of fruit ... 32

2.6 Deep learning in agricultural image identification ... 34

2.6.1 Plant species and diseases identification ... 35

2.6.2 Grapevine variety classification ... 36

2.7 Conclusions ... 37

3 Background

39

3.1 Introduction ... 39 3.2 Machine Learning ... 39 3.3 Neural Networks ... 40 3.4 Learning process ... 45 3.5 Deep learning ... 50

3.6 Convolutional Neural Networks ... 52

3.6.1 Motivation ... 52

3.6.2 Structure of layers ... 52

3.6.2.1 Spatial convolutional ... 52

3.6.2.2 Non linear activation functions ... 53

3.6.2.3 Pooling and stride ... 55

3.6.2.4 Fully-connected ... 56

3.6.2.5 Softmax and classification ... 56

3.6.3 Architectures ... 56 3.6.3.1 LeNet ... 56 3.6.3.2 Deep ... 57 3.6.3.3 Very deep ... 57 3.6.3.4 Residual ... 58 3.6.3.5 Dense ... 58 3.6.4 Training methods ... 58

(17)

3.6.4.1 Transfer learning... 58 3.6.4.2 Loss functions ... 59 3.6.4.3 Optimization algorithms ... 60 3.6.4.4 Regularization techniques... 61 3.7 Software Libraries ... 63 3.8 Hardware requirements ... 65 3.9 Conclusions ... 65

4 Materials and methods

67

4.2 Materials ... 67

4.2.1 The programming platform ... 67

4.2.2 Deep learning and other toolboxes ... 68

4.2.3 Computer resources ... 69

4.2.4 Image databases ... 69

4.2.4.1 The Douro Demarcated Region ... 69

4.2.4.2 Douro vineyard image collection ... 75

4.2.4.2.1 DRGV image database ... 76

4.2.4.2.2 DRGV_2018 image database ... 77

4.2.4.3 Flavia leaf image database ... 78

4.3 Methods ... 78

4.3.1 Independent component analysis ... 78

4.3.2 Canny edge detector ... 80

4.3.3 Gray morphology procedure ... 81

4.3.4 Grapevine leaf segmentation algorithm ... 84

4.3.5 The four-corners-in-one method ... 89

4.3.6 Deep learning model ... 91

4.3.6.1 AlexNet network architecture ... 92

4.3.6.2 Pre-processed datasets ... 94

4.3.6.2.1 From the DRGV image database ... 94

4.3.6.2.2 From the DRGV+2018 image database ... 96

(18)

5 Grapevine variety classification system

101

5.2 Experimental results ... 101

5.2.1 Performance on the DRGV test set ... 101

5.2.2 Performance on the DRGV+2018 test set ... 102

5.2.3 Network feature visualization ... 104

5.2.4 Performance on the Flavia leaf test set ... 106

6 Conclusions and future developments

109

6.2 Future developments ... 111

(19)

L

ist of Figures

1. Block diagram of supervised machine learning for image-based species identification ... 15

2. Block diagram representation of the nervous system ... 40

3. Block diagram of a nonlinear model of a neuron k ... 41

4. Feedforward network with a single layer of neurons ... 43

5. Fully-connected feedforward network with one hidden layer and one output layer ... 44

6. Recurrent network with neither self-feedback loops nor hidden neurons ... 44

7. Block diagram of a supervised learning ... 48

8. Block diagram of reinforcement learning ... 49

9. Block diagram of unsupervised learning ... 50

10. An example of a CNN ... 53

11. An exemplar of a Feitoria delimitation mark owned by Quinta da Pacheca, 1758 ... 71

12. Panoramic view of the Quintas of Douro at the Baixo Corgo region, Mesão Frio, autumn 2018 ... 72

13. A pedagogical show of each one of the six predominant Douro castes, in Quinta da Pacheca ... 74

14. A sample of each of the six red grape varieties of the DRGV database... 76

15. A sample of each of the six red grape varieties of the DRGV_2018 database ... 77

16. Example images of the Flavia leaf dataset ... 78

17. Independent component analysis ... 80

18. Canny edge detection operator ... 81

19. The procedure of the leaf vein extraction ... 82

20. Gray morphology procedure... 84

21. Pre-processing phase ... 85

22. Shadow region detection ... 86

23. Non-shadow region processing ... 87

24. Shadow region processing ... 87

25. Detection of the coloring reddish-brown of blurs inside the segmented leaves ... 88

(20)

27. The proposed grapevine leaf segmentation method ... 89

28. The steps of the proposed four-corners-in-one method ... 90

29. MatLabTM function code to produce the North-West corner image ... 91

30. The concept of DL and its relationship to ML ... 91

31. ConvNet‘s feature extractor and classifier on the training phase ... 93

32. Data splitting from the DRGV pre-processed datasets #1 to #7 ... 96

33. Confusion matrix over the test set ... 103

34. The first 30 features learned by the five convolutional layers of the proposed classifier ... 105

35. The 6 features learned by the last fully-connected layer (fc8) of the proposed network when identifying an image of Touriga Franca ... 106

(21)

L

ist of Tables

1. An overview of literature on segmentation of plant leaf images ... 24

2. An overview of literature on segmentation of fruit images ... 27

3. An overview of literature review on classification of fruit images ... 30

4. An overview of literature review on estimation of fruit appearance features ... 33

5. Comparison of the most popular software libraries for deep learning ... 64

6. An overview of the MatLabTM toolboxes used in this work in the different areas... 68

7. Distribution of the images per caste captured during the 2016 harvest season ... 76

8. Distribution of the images per caste captured during the 2018 harvest season ... 77

9. Summary of leaf segmentation results ... 89

10. AlexNet deep network in DL MatLab™ toolbox ... 93

11. Statistics of the used pre-processed datasets from the DRGV image dataset ... 96

12. Data splitting from the DRGV+2018 pre-processed datasets #8 to #10 ... 97

13. Training, validation and test accuracy and loss for different TL schemes on the AlexNet-based network (DRGV pre-processed datasets) ... 102

14. Training, validation and test accuracy and loss for different TL schemes on the AlexNet-based network (DRGV+2018 pre-processed datasets) ... 103 15. Identification accuracies over DRGV+2018 test set for different data augmentation methods 104

(22)

(23)

A

cronyms and abbreviations

Table of acronyms

Acronym Description

AI Artificial Intelligence AlexNet Alex Krizhevsky‘s ConvNet (A)NN (Artificial) Neural Network

BP Back-Propagation

BSD Berkeley Software Distribution CCD Charge-Coupled Device (sensors)

CIE Commission Internationale de l'Éclairage (for its French name) ConvNet Convolutional Neural Network

CPU Central Processing Unit

CUDA Compute Unified Device Architecture (Nvidia Corporation)

DBN Deep Belief Network

DDR Douro Demarcated Region

DDR4 Double Data Rate fourth-generation (memory) DenseNet Densely Connected Convolutional Network

DL Deep Learning

DRGV Douro Red Grape Varieties

FE Features extraction

GPU Graphical Processing Unit

HOG Histogram Of Gradients

HSI Hue, Saturation and Intensity components (color model) HSL Hue, Saturation and Luminance components (color model) HSV Hue, Saturation and Value components (color model)

ICs Independent Components

ICA Independent Component Analysis

(24)

KNN K-Nearest-Neighbor

LSTM Long Short-Term Memory

L*a*b Luminance, from green to red and from blue to yellow components (color model) MatLabTM Matrix Laboratory (trademark of The Mathworks, Inc.)

ML Machine Learning

MSE Mean Square Error

RBF Radial basis function (neural network) RBM Restricted Boltzmann Machine

RGB Red, Green and Blue components (color model) ReLU Rectified Linear Unit

ResNet Residual Network

RNN Recurrent Neural Network

ROI Region Of Interest

SGD Stochastic Gradient Descent

SGDM Stochastic Gradient Descent with Momentum SIFT Scale-Invariant Feature Transform

SURF Speeded-up robust features

SVM Support Vector Machine

TL Transfer Learning

VGGNet Visual Geometry Group Network

YCbCr Luminance, blue-difference and red-difference components (color model)

Table of abbreviations

Abbreviation Meaning

e.g. For example

et al. And other authors

etc. And other more

(25)

1

I

ntroduction

1.1 Theme and context

The theme of this thesis is inserted on the problematic of computer vision of digital images of vineyards captured in natural environment. The proposed work focuses on the development of an algorithm for automatic classification of the six endogenous red grape varieties originated from the Portuguese Douro Demarcated Region (DDR), namely, Tinta Amarela, Tinta

Barroca, Tinto Cão, Touriga Franca, Touriga Nacional and Tinta Roriz.

Most of the proposed automatic detection and classification systems for agricultural products are primarily developed for robotic/automated harvesting purposes. An automated harvesting process is crucial to improve efficiency, productivity and cost reduction in the agricultural labor. Subsequently, a set of the daily farming tasks, such as grading, sorting, and fruit counting, plant disease early detection, varieties identification, crop yield prediction, pest damage, and spraying system optimization, are likely developed by researchers with the main purpose of reducing the farmer‘s manual monitoring and visual decision-making procedures. In the context of the Douro Region, the vineyards frequently present multiple varieties per parcel and even per row (Reis et al., 2012). Knowing the susceptibility of a particular variety to a specific disease, its identification using this automatic system, will help, for example, in a more specific and targeted treatment. In addition, many wine producers are entitled to this large variety of grape varieties to produce their most expensive wines (e.g., the ―Blend D. Antónia‖ wine, from the Quinta do Vallado, has more than 30 varieties). Consequently, implementing an automatic algorithm for grapevine variety classification to provide an automatic splitting of the different grape varieties will be of paramount importance in the DDR. Therefore, an automatic algorithm for grapevine variety classification becomes crucial for the developing of an accurate robotic grape harvesting system.

Since decades, the application of image processing techniques in agriculture has become an area of research of great relevance and interest with promising results. In initial research works pre-processing methods were applied to the original images for enhancement purposes. A classical solution for noise removal is the application of histogram equalization and filtering techniques over the original raw images. The background removal of images is an

(26)

enhancement technique frequently used in agricultural applications for optimized display of images and, mainly, for subsequent image analysis operations (Berenstein et al., 2010; May and Amaran, 2011; Omid et al., 2010).

A wide variety of pre-processing procedures, like threshold techniques (Bama et al. 2011; Sengupta and Lee, 2012; Al-Ohali, 2011; Zhang and Wu, 2012), noise filtering (Badnakhe and Deshmukh, 2011; Bakhshipour et al., 2012; Ji et al., 2012) and miscellaneous enhancement techniques (Badnakhe and Deshmukh, 2011; Lurstwut and Pornpanomchai, 2011) have been proposed.

In agricultural applications, the color model conversion is applied in two different image processing phases by the investigators: first, as enhancement algorithms aiming to easily segment objects in an image (Wang et al., 2012; Patil and Bodhe, 2011; Stajnko et al., 2009; Chamelat et al., 2006); second, and more frequently, to extract color features to posteriorly perform the detection of a ROI (Region Of Interest) in an image (Berenstein et al., 2010; Bama et al. 2011; Ji et al., 2012; Lurstwut and Pornpanomchai, 2011; Stajnko et al., 2009; Chamelat et al., 2006; Reis et al., 2012; Patil et al., 2011; Nuske et al., 2011; Savakar, 2012; Jaware et al., 2012; Anami et al., 2011; Bashish et al., 2011; Zhou et al., 2012; Patel et al., 2011).

A few number of research works related with the theme of this thesis were developed. Yanne et al. (2011) have proposed a grapevine variety classification method via leaf image processing. The algorithm was tested to classify 354 leaf images belonging to 20 grape varieties. The correct classification rate of 87% was obtained by calling the classifier on all the totality of the images and counts the presence of corrected classified ones. Chamelat et al. (2006) have implemented a new method for grape detection in outdoor images. For two class classification purposes, a support vector machine (SVM) based classifier was applied. To test the algorithm, the authors used an image database composed of only 18 outdoor images. Experimental results show that SVM performs well with less 0.5% errors for the recognition of grapes in an image. Correa et al. (2011) have evaluated various fuzzy clustering algorithms applied to feature extraction on vineyard images. From a set of 200 vineyards color images, a representative subset of 20 images was manually segmented. The experiment results associates the fuzzy clustering algorithms with the best compromise between speed and classification performance.

In the last few years with the advances on computer resources, namely, processor speed, graphical processing and memory availability, several researchers have worked on automatic

(27)

1. Introduction

plant identification and plant disease detection from leaf images using Deep Learning (DL) models, reaching high accuracy values of classification.

To demonstrate the benefits from using DL approaches instead of applying traditional image processing and classification techniques, an imaged-based plant identification proposed by Grinblat et al. (2016) and one reported in their previously research (Larese et al., 2014) clearly shows the differences. In both papers, three legume species, namely, soybean, white and red beans were studied based only on the analysis of their vein morphological patterns. Primarily, Penalized Discriminant Analysis achieved the best average accuracy with 89.9% followed by Support Vector Machines (SVM) with linear kernel presenting 89.7% whilst using a ConvNet with 5 layers, they obtained a maximum accuracy of 96.9%. Also, considering that leaf vein contains important information for plant species recognition, Fu and Chi (2006) proposed a new approach that combines a thresholding method and an Artificial Neural Network (ANN) classifier to extract leaf veins. A preliminary segmentation based on the intensity histogram of leaf images followed by a fine segmentation using a trained ANN classifier was developed. Experimental results showed that this combined approach is capable of extracting more accurate venation of the leaf for vein pattern classification.

Concerning the grape detection in outdoor images, the application of Zernike moments and color information for the description of the grape shapes were exploited. For the learning and recognition steps, a SVM was used, allowing the recognition of grapes in 99% cases with very few samples, just used in the learning step (Chamelat et al., 2006). From a dataset of 190 images containing bunches of white grapes, and 35 images of red grapes, the visual inspection system using color mapping and morphological dilation techniques, proposed by Reis et al. (2012), was able to automatically distinguish between red and white grapes, and has achieved 97% and 91% of correct classification, respectively.

Concerning the problem of plant disease detection, Amara et al. (2017) said that ―Plant diseases cause major production and economic losses in agriculture and forestry‖. They proposed a DL approach for automatically classifying banana leaf diseases, using the LeNet architecture as a ConvNet to classify image data sets. Experimental results demonstrated that the developed model was able to recognize two different types of diseased leaves out of healthy ones. Liu et al. (2018) proposed an approach for apple leaf diseases identification. From a dataset of 13,689 images of diseased apple leaves, the proposed deep ConvNet based on the AlexNet model was trained to identify the four common apple leaf diseases, achieving an overall accuracy of 97.62%. They also studied the impact on the identification accuracy of

(28)

apple leaf diseases against the size of the dataset. From the two sets of experiments performed, the network was trained separately with and without an expanded image dataset, reaching a recognition rate of 86.79% and 97.62%, respectively. Kho et al. (2017) propose an automated system to identify three species of Ficus (one of the largest genera in plant kingdom reaching to about 1000 species worldwide), which have similar leaf morphology. Fifty-four leaf images from three different Ficus species were used. Image pre-processing, feature extraction and recognition models, as ANN and SVM, was carried out to develop the proposed system. Evaluation results showed the ability of the proposed system to recognize leaf images with an accuracy of 83.3%.

Too et al. (2018) evaluated the state-of-the-art of deep ConvNet for image-based plant disease classification. The architectures studied were VGG 16, Inception V4, Resnet with 50, 101 and 152 layers and Densenet with 121 layers. A test accuracy score of 99.75% for the 30th epoch with Densenet from Plant Village dataset was obtained. They concluded that Densenet is a good architecture to successfully solve this task. Lastly, they also contribute with an excellent literature review about the application of digital images captured in-field, on the vine, to the management of vineyards, focusing on the yield estimation, quality evaluation, disease detection and grape phenology applications.

1.2 Motivation

In the current commercial market, two different testing methods for determining the grape varieties planted in the vineyards, namely, the ampelography and the deoxyribonucleic acid (DNA), are available. As a field of botany, the ampelography uses the shape and the color of physical vine leaves and grape berries for identification and characterization of the grapevines. More recently, an accurate method, the DNA testing, uses the leaf samples to identify the variety. Because only small specific regions of the DNA profile or ―DNA fingerprint‖ are useful for variety identification, the methods consist of matching the DNA profile of the leaf sample to a reference database that is specific to each fruit (e.g., grape). Both methods present expensive costs and time-consuming responses.

In opposition, an efficient detection of fruits and plant leaves in their natural environment using computer vision and image processing techniques offers great benefits to the farmers. The automatic identification and classification of different classes or varieties of agricultural products allows the fruit grading, sorting and counting systems, fruit and plant leaf disease detection, stem location systems and selective selection of fruit varieties for harvesting

(29)

1. Introduction

purposes. These are some examples of automated systems to help the farmer on his daily work.

As it increases the speed, the ability and the economic costs of modern signal processing devices, also increases the need for creating automatic systems, in real time, in order to be able to emulate the dimensional (3D) behaviour of the human vision. The three-dimensional world in which we live is converted into two-three-dimensional (2D) entities, which are designated by images, which are viewed through the visual system or electronic sensors. The researching area of digital image processing work on these images to help us to reach better conclusions or make more efficient decisions.

In almost all the cases, digital color images of vineyards are acquired in a natural environment, subject to all adverse outdoor conditions using an image acquisition scheme that integrates, normally, a conventional digital camera to capture them. For this reason, the captured images present lack in contrast and brightness, mainly due to illumination conditions and, since that images are captured from the fields, it is unavoidable that some due drops, excretes of insects and dust might appear on the captured images, resulting in the presence of noise in these images.

Throughout the world, the harvest within the vineyard can vary from year to year with significant changes on the grapevine, mainly due to soil conditions, diseases, pests, adverse climate and the presence of pesticides. In addition, the vineyards of the Douro Region are also characterized for having more than one caste per parcel and even for bard (Reis et al., 2012). As presented above, knowing the susceptibility of a particular caste to a specific disease, its identification using this automatic system, will help, for example, in a more specific and targeted treatment. In addition, many wine producers are entitled to this large variety of grape varieties to produce their most expensive wines (e.g., the ―Blend D. Antónia‖ wine, from the Quinta do Vallado, has more than 30 varieties). Consequently, implementing an automatic algorithm for grapevine variety classification, to provide an automatic splitting of the different grape varieties, will be of paramount importance in the DDR.

By these reasons, one of the most useful software components to be implemented in a robotic harvest system will be an algorithm for grapevine variety classification, to provide an automatic splitting of the different grape varieties.

(30)

1.3 Goals

The main objective of this thesis consists of developing a computer algorithm able to accurately identify and characterize grape varieties. This algorithm will be intentionally developed and integrated in a future harvesting robotic system of bunches of grapes.

Due to the reasons mentioned in section 1.2, the analysis of a color image from a vineyard is a hard challenge. To reach the objectives proposed in this thesis, the following general steps were taken:

1. Two natural vineyard image (named DRGV and DRGV_2018) datasets were captured via digital cameras in three of the most charismatic Quintas of the Douro Region;

2. During the capturing phase, the images were accurately classified by an expert wine grower responsible for each Quinta of the Douro in one of the six grape varieties in study. According to those classification, the image datasets were splitted and labelled with the corresponding true caste;

3. As leaves play an important role in plant identification, due mainly to be easily found and captured in fields during all seasons of the year, a grapevine Leaf Segmentation Algorithm (LSA) applied to the two image datasets was developed;

4. A set of image processing methods, namely, a proposed four-corners-in-one image warping method, independent component analysis (ICA), Canny Edge Detector (CED) and Gray-Scale Morphology Processing (GMP) were applied over the vineyard image datasets captured in 1);

5. From the application of the previous image processing methods, a first collection of seven pre-processed datasets extracted from the DRGV dataset (datasets #1 to #7) and a second collection of three pre-processed datasets extracted from the DRGV+2018 dataset (datasets #8 to #10) were generated. The DRGV+2018 dataset is so-called by merging the DRGV and DRGV_2018 datasets;

6. For image classification purposes, the structure of the AlexNet model was used. In this thesis, the AlexNet deep model using the transfer learning techniques retrain every pre-processed datasets after augmentation process. Three different fine-tuning strategies were also applied;

7. A set of experiments led to an effective study of the behaviour of the fine-tuned learning on the AlexNet architecture trained on diverse pre-processed datasets with the goal to get the best classifier for grapevine variety classification.

(31)

1. Introduction

During our research, natural environment images were captured from diverse Quintas of Douro to overcome from the lack of an available vineyard image dataset of the most predominant grape varieties at this demarcated region. With the aim to facilitate and promote further research in grape varieties identification, two high-quality image datasets were acquired and available, as described below:

1. DRGV (Douro Red Grape Varieties) dataset (captured in 2016 harvest season) consisting of 140 vineyard images subdivided into the six most significant grape varieties (Tinta

Amarela, Tinta Barroca, Tinto Cão, Touriga Franca, Touriga Nacional e Tinta Roriz) were acquired in two vineyards at Douro Region (Quinta da Pacheca and Quinta do Vale

Abraão) with an average of 23 images per variety.

2. DRGV_2018 dataset (captured in 2018 harvest season) consisting of 84 vineyard images subdivided into the same six different grape varieties were acquired in two vineyards at Douro Region (Quinta do Vallado and Quinta da Pacheca) with an average of 14 images per variety.

1.4 Constraints

Grapevine variety classification from in-field images is a challenge since the images are the mirror of the adverse effects caused by the nature against the grapevines and by the high visual similarity of the leaf images on the different grape varieties.

To date, most of the work has focused on the processing of images tailored in laboratory. Typically, the objects in study were isolated at the center of the image over a white background. Every approach over images in the natural environment must be able to cope with the issues about lighting conditions, occlusion of the grapes and leaves, in addition to all the previously mentioned factors.

Under these assumptions, Deep Learning (DL) as defined by Goodfellow et al. (2016) was used. It is recognized as a remarkably active research area in machine learning (ML) and artificial intelligence (AI) with successful applications in numerous fields, namely, for precision agriculture purposes. One of the strongest advantages of using DL in image classification is its powerful image feature extractor, both from raw or pre-processed data (and both used, in this thesis, on training), avoiding the traditional and time-consuming process of hand feature extraction, which needs, in many occasions, the intervention of an agriculturist expert and implies to be altered whenever the problem or the dataset changes. DL

(32)

automatically searches for low level features, such as edges and curves, until higher levels of its hierarchical structure model (Kamilaris and Prenafeta-Boldú, 2018).

This systematic approach for the hierarchical processing of knowledge and the complex non-linear model mark the relevance of the DL network. Convolutional neural network (ConvNet) is just a DL network with many hidden layers that imitates how the visual cortex of the brain processes and recognizes images. ConvNet has achieved state-of-the-art results in image classification by feeding the network with a huge number of natural images (Hertel et al., 2015; Sugata and Yang, 2017; Sun et al., 2017). Extending this idea, richness and diversity applied on the augmented pre-processed image datasets in the experiments were crucial for the classification performance enhancement in a ConvNet. By these reasons, the AlexNet ConvNet, trained on diverse pre-processed image datasets, is used. Developed by Krizhevsky et al. (2012), the AlexNet is a pre-trained model on a subset of the ImageNet database comprising more than a million images and could classify images into 1000 object categories, which was the winner in ImageNet Large-Scale Visual Recognition Challenge.

Due to the long time needed to train such very deep ConvNets, the problem is tackled by reusing the feature extraction part of a popular pre-trained network from a very large dataset and retrain the classification part on multiple different datasets and fine-tune learning techniques (Hertel et al., 2015). This method, called transfer learning (TL), is frequently used in the computer vision area and allows building accurate models in a timesaving way (Rawat and Wang, 2017). It consists of starting the learning process from patterns that have been learned when solving a different problem, instead of starting from zero. This thesis follows the DL technique evaluating several TL schemes on the AlexNet ConvNet trained over diverse pre-processed image datasets, for grapevine variety classification in natural vineyard images.

Kamilaris and Prenafeta-Boldú (2018) have alerted for the importance of verify whether the authors had tested their implementations on the same or different datasets. They concluded that, from 40 research papers analyzed in the study, only 20% used different datasets for testing than the one for training, having, in these cases, obtained accuracy results generally less than 70%. Examples of this are the works of Potena et al. (2016) and Dyrmann et al. (2017), reaching an accuracy of 59.4% and an IoU (Intersection over Union) segmentation metric of 0.64, respectively.

The proposed system for grapevine variety classification highlights some issues and constraints concerning the training phase of a DL network. First, a very low volume of

(33)

1. Introduction

images; Second, images captured in natural environment; Third, significant changes on the images of grapevine leaf or bunches of grapes in different harvest seasons, mainly due to adverse climatic conditions, pest, diseases and pesticides on the grapevines; Fourth, high similarity of the images on the different grape varieties in the DDR; and fifth, issues on the robotic harvest at the DDR vineyards due to the existence of more than one caste per parcel and even for bard.

Although the experimental results obtained by Šulc et al. (2017) suggest that the ―recognition of segmented leaves is practically a solved problem, when high volumes of training data are available‖, they also concluded that, in the presence of a small number of samples, the identification problem remains a valid problem for uncommon plant species, rare phenotypes, among others. It is the case studied and experimented in this thesis.

With regard to the high similarity of the leaf images on the different species studied, Kho et al. (2017) identified only three leaf species of Ficus (among 1000 species worldwide), which have similar leaf morphology. Their proposed system to recognize leaf images showed the ability to reach an accuracy of 83.3%. Comparing with the system proposed in this thesis, an accuracy score of 78.86%to identify six varieties was reached, i.e., twice the reported number of grape varieties.

1.5 Problem statement

Automatic grapevine variety classification is a matter of great interest in precision viticulture. Traditionally, the Douro Region owns on its vineyards different grape varieties in the same parcel, evidencing high visual similarity between different varieties, turning their identification a challenging task even for viticulture experts.

This thesis presents an approach, based on the AlexNet architecture with transfer learning scheme, to automatically identify and classify six grape varieties that predominate on the DDR as the logical part of a robotic grape harvesting system. The computer can automatically classify six kinds of grape varieties via the segmented vine leaf images. Image pre-processing and data augmentation were adopted for reducing the overfitting degree of the model. Experimental results indicate that the proposed classifier is feasible with an accuracy of 78.86%. Applying the same classifier model, an accuracy of 89.75% on the popular Flavia leaf dataset was obtained.

(34)

1.6 Publication of research results

All results obtained by the author in the image processing (IP) researching area were, over the last years, communicated to the international scientific community through the publication of articles/papers as sections of books, international conference proceedings and journals, all peer-reviewed. Thus, the following published works, listed by ascending order of publishing date, whose integral copies are available in the attachments of the current thesis, are presented.

The purpose of the survey published in Pereira et al. (2017) is to categorize and briefly review the literature on computer analysis of fruit images in agricultural operations, which comprises more than 60 papers published in the last 10 years. With the aim to perform applied research in agricultural imaging, this paper intends to focus on advanced image processing and analysis techniques used in applications for detection and classifications of fruits, developed in the last decade. For the reviewed techniques, some performance evaluation metrics achieved in various experiments are emphasized to help the researchers when making choices and develop new computer vision applications in fruit images.

In the publication Pereira et al. (2018), we propose a segmentation algorithm based on region growing using color model and threshold techniques for classification of the pixels belonging to vine leaves from vineyard color images captured in real field environment. To assess the accuracy of the proposed vine leaf segmentation algorithm, a supervised evaluation method was employed, in which a segmented image is compared against a manually-segmented one. Concerning boundary-based measures of quality, an average accuracy of 94.8% over a 140 image dataset was achieved. It proves that the proposed method gives suitable results for an ongoing research work for automatic identification and characterization of different endogenous grape varieties of the Portuguese DDR.

The goal of a paper published in Pereira et al. (2019) is to evaluate the performance of the transfer learning and fine-tuning techniques based on AlexNet architecture when applied to the classification of grape varieties. Two natural vineyard image datasets were captured in different geographical locations and harvest seasons. To generate different datasets for training and classification, some image processing methods, including a proposed four-corners-in-one image warping algorithm, were used. The experimental results, obtained from the application of an AlexNet-based transfer learning scheme and trained on the image dataset pre-processed through the four-corners-in-one method, reached the best image classification accuracy of 77.30% over a test set. Applying this classification model, an accuracy of 89.75%

(35)

1. Introduction

on the popular Flavia leaf dataset was achieved. The results obtained by the proposed approach are promising and encouraging in helping Douro wine growers in the automatic task of identifying grape varieties.

1.7 Thesis outline

This thesis is organized in six sections as follows. The current one is reserved for a brief introduction to the theme of the thesis, and including the motivation, goals and researching publication of the author of the thesis.

Section 2 deals with the literature review and related work. It introduces the concept of precision agriculture and presents the challenges that usually occur in automatic plant species identification. A review of the relevant publication in image pre-processing techniques, image segmentation methods and image analysis applied in the field of the agriculture was collected. Related to the development of our grapevine variety classification system, some of the most relevant researching works in DL are reported. At the end of this section, an overview of other application areas, specifically, on medical diagnosis about the presence of lung and breast nodules are presented.

Section 3 exhibits the theoretical fundamentals that are on the basis of the IP and analysis methods used in this thesis. It begins with the presentation of the machine learning, neural networks and learning process. Next, and more recently, the structure and learning on deep networks. ConvNets are detailed respecting to its structure of layers, available architectures and training methods. The most used software libraries for DL implementation are also presented.

Section 4 describes the materials used in this thesis. It comprises the programming platform, DL libraries, computing resources and captured natural vineyard image databases. A brief characterization of the DDR about the topics, historical-geographic, weather, geomorphology and the predominant grape varieties for Port wine production are also depicted. Next, the ICA, CED and GMP, as well as the proposed LSA and the four-corners-in-one warping methods, are detailed. At the end of this section, the AlexNet model architecture, the pre-processed datasets, the TL schemes and fine-tuning parameterization are also described. Section 5 presents the experimental results, obtained from the application of an AlexNet-based transfer learning schemes and trained on the image datasets pre-processed through diverse IP methods (detailed in section 4) to classify six red grape varieties from the DDR.

(36)

Section 6 promotes a discussion about the performance of the TL on our application, using fine-tuning and fixed feature extractor schemes based on AlexNet architecture against the results achieved by the pre-processed dataset generated by the proposed four-corners-in-one warping algorithm and the classification accuracies reached by the proposed approach, as well as the conclusions and ongoing work.

(37)

2

Literature review and related work

2.1 Introduction

This section presents the literature review related to the image processing and analysis techniques applied in the agricultural area. It begins with the concept of precision agriculture and robotic technologies, which are conducted to the various phases of wine production, namely, pruning, crop, monitoring, yield estimation and harvesting. Next, the related work with the deep learning models for fruit, plant leaf and disease classification of in-field images are reported. At the end, other relevant applications, specifically in the medical field, are shown.

2.2 Precision agriculture

Precision Agriculture (PA) is generally defined as new information technologies based on the farm management systems to identify, analyse and manage the natural variability found within fields for optimum profitability, sustainability and protection of the land resources (Banu, 2015). In this way, the application of a set of technologies, that make use of sensors, information systems, improved machinery and informed management to optimize production linked by the natural variability and uncertainties in agricultural systems, can aid the farmers in many tasks, through the use of computer vision technology and image processing, to determine the soil nutrient composition, right amount, right time, right place application of farm input resources like fertilizers, herbicides, water, weed detection, early detection of pest and diseases, among others (Abdullahi and Zubair, 2017; Seng et al., 2018).

In the agriculture industry, technological solutions based on sensors are focused on automating agriculture tasks in order to increase the production and benefits while reducing time and costs for the farmer (Font et al., 2014). A comprehensive description of research focused on solving agriculture and forestry tasks by using sensors is summarized in Pajares et al. (2013). For example, to build a high quality fresh fruit harvesting systems it is required the automatic detection of the fruits, the estimate of their size and relative location and orientation, and the definition and control of a non-stressing pickup procedure (Font et al.,

(38)

2014). Other examples of the tasks that can be implemented by using computer vision techniques are the ones presented by Palacios et al. (2019), and Andrushia and Patricia (2020).

2.3 Plant species identification

Plant identification is difficult, time consuming, and discouraging for novices due to the use of specific botanical terms. In recent years, computer science research in image processing and pattern recognition techniques have been introduced into the task of plant taxonomy to help people in the identification abilities. Currently, relevant image acquisition technologies, such as digital cameras, mobile devices and remote access to databases accompanied by significant advances in image processing and pattern recognition, turns the automated plant species identification a reality.

As stated by Cope et al. (2012), plant identification is currently particularly important because of concerns about climate change and the resultant changes in geographic distribution and abundance of species. Automated identification of plant species, for example, using leaf images, is a worthwhile goal because of the current combination of rapidly dwindling biodiversity, and the dearth of suitably qualified taxonomists, particularly in regions of the world which currently have the greatest numbers of species, and those with the largest number of ―endemics‖, i.e., species restricted to that geographic area.

Wäldchen et al. (2018) distinguishes the identification features extracted from a field image in qualitative or quantitative characters. Quantitative characters can be counted or measured, such as plant height, flower width or the number of petals per flower, while qualitative characters are features such as leaf shape, flower color or ovary position. Samples of the same species share a combination of relevant identification features. Since, in nature, no two plants look exactly the same, it requires a certain degree of generalization to assign new samples to a species. From a machine learning perspective, plant species identification is a supervised classification problem, as outlined in figure 1.

For the above reasons (the intrinsic aspect and behavior of the nature), a training phase in which the classifier learns to distinguish species is required. The training phase (on the top of figure 1) comprises the analysis of training images that have been independently and accurately identified as ―taxa‖ (a biology term defined as formal classes of living things consisting of the taxon's name and its description) and are now used to determine a classifier's parameters for providing maximum discrimination between these trained taxa. In the

(39)

2. Literature review and related work

application phase (on the bottom of figure 1), the trained classifier is then exposed to new images depicting unidentified specimens and is supposed to assign them to one of the trained taxa.

Figure 1. Block diagram of supervised machine learning for image-based species identification (extracted from Wäldchen et al., 2018).

2.3.1 Challenges in automatic identification

On average, half of the studies on automated plant identification evaluated the proposed method with public datasets (allowing for continuity of studies and comparison of methods) and the other half, used proprietary leaf image datasets not available to the public (Wäldchen et al., 2018).

The images contained in the training datasets fall into three categories: scanning, pseudo-scanning and photo. The pseudo-scanning and pseudo-pseudo-scanning categories correspond to leaf images captured through digital scanning operation and photography in front of a simple background, while the photo category corresponds to leaves photographed by a digital camera on natural background. The majority of utilized leaf images, i.e., simple, healthy and not degraded leaves are collected and imaged in the laboratory scanning and pseudo-scanning categories (Wäldchen and Mäder, 2018). Obviously, this fact considerably simplifies the classification task. Being the object of interest imaged against a plain and uniform background, the often necessary segmentation for distinguishing foreground and background can be performed in a fully automated way with high accuracy (Wäldchen et al., 2018).

(40)

such as deformed, partial, overlapped, pests, pesticides, sun burning, among others) are largely avoided in the current studies, because segmenting the leaf with natural background is particularly difficult. Interferences around the target leaves, such as small stones and ruderals may create confusion between the boundaries of adjacent leaves. Compound leaves (leaves consisting of two or more leaflets born on the same leafstalk) are particularly difficult to recognize. Existing studies that are designed for the recognition of simple leaves are not easily applied directly to compound leaves. This is supported by the variation of a compound leaf – it is not only caused by morphological differences of leaflets, but also by changes in the leaflet number and arrangements (Zhao et al., 2015).

Wäldchen et al. (2018) characterize the relevant image information, reduced by computing feature vectors to be directly used by a machine learning algorithm, as: (i) model-based approach, in which feature detection, extraction, and encoding methods for computing characteristic feature vectors is initially a problem-specific task, resulting in a model customized to the specific application, e.g., the studied plant parts like leaves or flowers; (ii) Model-free approach do not employ application-specific knowledge (so, a higher degree of generalization across different species is expected) and detects characteristic interest points and their description using generic algorithms, such as scale-invariant feature transform (SIFT), speeded-up robust features (SURF), and histogram of gradients (HOG); and (iii) deep learning ConvNet approach make use of efficient and massively parallel computing on graphics processing units (GPUs) and of large-scale image data necessary for training deep ConvNets with millions of parameters. In contrast to model-based and model-free techniques, ConvNets do not require explicit and hand-crafted feature detection and extraction steps. Instead, for a given problem, they automatically discover a hierarchical image representation (similar to a feature vector) composed of building blocks with increasing complexity per layer, reaching classification performances that were mostly unachievable using shallow learning methods with or without hand-crafted features.

According to Wäldchen et al. (2018), to provide a reliable and applicable automated species identification process, researchers need to consider the following main challenges: (a) A vast number of taxa (classes or species) to be discriminated from one another; (b) Samples of the same species that vary hugely in their morphology; (c) Different species that are extremely similar to one another; and (d) Large variation induced by the image acquisition process in the field.

(41)

(a) Large number of taxa to be discriminated

The world exhibits a very large number of plant species. Distinguishing between a large number of species is inherently more complex than distinguishing between just a few number, needing substantially more training data to achieve good classification performance.

(b) Large intraspecific visual variation

Plants belonging to the same species may show considerable differences in their morphological characteristics depending on their geographical location and different abiotic factors (e.g., light condition), the season (e.g., early fruiting stage to a withered fruit) and the daytime (e.g., the flower is opening and closing during the day). These changes in morphological characteristics can occur on leaves (e.g., area, width, length, shape, orientation and thickness), flowers (e.g., size, shape and color), fruits and entire plant.

(c) Small interspecific visual variation

Experienced botanists are challenged to safely distinguish species extremely similar to one another, which can be identified only by almost invisible characteristics. Detailed patterns of particular morphological structures may be crucial and may not always be readily captured in images of species. For example, the presence of flowers and fruits is often required for an accurate discrimination between species with high interspecific similarity, but these important characteristics are not present during the whole flowering season and therefore are missing in many images.

(d) Variation induced by the acquisition process

Images are subject to variations through the acquisition process. Field plants represent 3D objects, while images capture 2D projections depending on the perspective from which the image is taken, resulting in potentially large differences in appearance and shape. Furthermore, image-capturing typically occurs in the field under some external conditions, such as illumination, focus, zoom, resolution and the type of the sensor (Remagnino et al., 2016).

2.3.2 Plant organs for automated identification

The visible part of the plants may be composed of four organ types: stem, leaf, flower and fruit. In a traditional plant identification process, people typically consider the plant as a

(42)

whole. In the case of automated plant identification, the characteristics of these organs are currently analyzing separately. As reported by Wäldchen et al. (2018), one image alone is typically not sufficient due to the following reasons: (a) plant organs may differ in scale and cannot be viewed in detail along with the whole plant or other organs; and (b) different plant organs require different captured image perspectives (e.g., leaves are most descriptive from the top, while the stem is better described from the side).

The most previous researching solely used the leaf for discrimination (Wäldchen and Mäder, 2017). From a botanical perspective, manual identification of plants in the vegetative state is more challenging than in the flowering state. From a computer vision perspective, automated identification of plants using leaves present several advantages over other plant organs, such as flowers, stems, or fruits. The main reasons are that leaves are available to be captured throughout most of the year and can easily be collected, preserved and imaged due to their planar geometric properties.

The flower is often the visually most prominent and perceivable part of a plant. In contrast, previous studies on automated identification rarely used flowers for discrimination (Wäldchen and Mäder, 2017) because flowers are only available during the blooming season and, by that, along a short period of the year. The following reasons lead the flower-based identification a challenging task: Being flowers complex 3D objects, there are numerous variations in viewpoint, occlusions and scale of flower images compared to leaf images. Besides that, images of flowers captured in natural environment vary due to lighting conditions, time, date and weather.

Most recent research in automated identification explores multi-organ-based plant identification. Joly et al. (2014) proposed a multiview approach that analyzes up to five images of a plant in order to identify species. This multiview approach allows classification at any period of the year, as opposed to purely leaf-based or flower-based approaches that rely on the supported organ to be visible. Initial experiments demonstrate that classification accuracy benefits from the complementarities of the different views. To support this research, the necessary training data, comprising multiple images of the same plant observed at the same time, must be acquired by the same person and with the same device. Each image can be labeled with the displayed organ (e.g., plant, branch, leaf, fruit, flower or stem), time and date, and geolocation, as well as the observer.

The training images must cover a large variety of scenarios, capturing different organs from multiple perspectives and at varying scales. Furthermore, images of the same organ acquired

(43)

from different perspectives often contain complementary visual information, improving accuracy in observation-based identification using multiple images (Joly et al., 2014).

2.4 Robotic Technology in Wine Production

Traditionally, viticulturists have primarily relied on their own skills for grape growing and harvesting. However, the cultivating, pruning and harvesting of grapes by hand are tedious, painstaking work and costly.

Since the 1950s, mechanical harvesters have been designed to specifically address the harvesting of grapes, reducing costs and significantly increase output. On the other hand, mechanized harvesters do not have the ability to discriminate between ripe, healthy grapes and those that are unsatisfactory, rotted, unripe or simply clusters of leaves. Harvesters can also damage the vines and grapes, rendering them unusable or undesired.

Potentially, robotic machines can be applied to many aspects of grape farming, such as pruning, crop monitoring, yield estimation and harvesting using advanced technology such as precise manipulation, computer vision and machine learning.

The timing of the harvest, occurring at the end of the growing season, is the most important decision a grape grower makes. The harvest time is determined by the ripeness of the grapes, and the flavors that the winemaker is striving to achieve.

2.5 Image processing in agriculture

Image processing techniques has been proved as non-invasive, non-destructive and an effective tool for analysis in various human activity areas, namely, agricultural applications. Interpreting a digital color image of fruit orchard captured in-field environment is extremely challenging due to adverse weather conditions, luminance variability and the presence of dust, insects and other unavoidable image noises.

Advanced computational methods have been used in agricultural applications, such the location of fruits in color images, to help the agriculturist in common farm tasks with more precision, efficiency, productivity and costs reduction using robotics for automated harvesting, spraying and counting. The automatic identification and characterization of different classes or varieties of agricultural products allows fruit grading and sorting systems, fruit and plant leaf disease detection and stem location systems for robotic or automated harvesting purposes. These are some examples of automated systems that are emerging to help the farmer on his daily work.