• Nenhum resultado encontrado

CHAPTER 3. PROPOSAL OF CONCEPTUALISATION

D) Loanwords

3.3.4 Intended Dictionary’s Basis

As mentioned above, plants’ designations in the Diccionario del español de México (DEM) are the basis of the planned work. DEM is a general monolingual dictionary whose basis, in turn, is the Corpus del español mexicano contemporáneo39 (hereinafter referred to as CEMC), formed by 996 different texts produced by Mexican people. Firstly, these texts were clustered in three registers: 1) formal language, 2) semiformal language, and 3) informal language. Secondly, they were distributed in those registers according to the following 14 categories:

I. Formal language: 1) Literature, 2) Journalism, 3) Science, 4) Technical, 5) Political discourse, 6) Religion, 7) Formal speech.

II. Semiformal language: 8) Popular literature, 9) Semiformal speech, 10) Folk works.

III. Informal language: 11) Dialectological texts, 12) Anthropological records, 13) Jargon, 14) Informal speech.

CEMC results are 1,891,045 tokens, of which 64,183 are different types40. We can consider them a reliable sample of Mexican Spanish41, and consequently, the lemmata related to plants (and the plants themselves) registered in DEM are also important for Mexican society and are common in the so-called general Spanish spoken in Mexico.

39 Corpus of Mexican Contemporary Spanish.

40 See Introducción al diccionario [Dictionary introduction], available on https://dem.colmex.mx/Contenido/8

41 Nevertheless, we bear in mind that any result of a given corpus is also, perhaps mainly, a sample of its composition. See §3.3.4.1.

47 3.3.4.1 Criteria for Sources and Lemma Selection

We are aware that plant designations in DEM as a starting point, although important, could be insufficient. In this regard, DEM’s editors state: “We cannot, as lexicographers, unravel all differences between biological species and their designations; thus, [DEM] just reflects what we have found in [CEMC] and in reference sources.”42 Thereby, we consider that specialized sources can complement our basis and should also be used as reference and verification point of the initial data. In the following section we set the criteria for source selection, and since many sources contain designations in other Mexican languages, we also establish criteria for lemma selection.

Sources

The main requirement for considering any work as a source for our intended work is that it deals with plants in Mexico. It should ideally provide records of both common designations in Spanish and binomial nomenclature(s) to have sound evidence that certain designation refers to a specific species.

However, the given work does not have to be exclusively about plants since there is much research about a region’s ecosystems. The potential source does not have to consider just endemic plants to Mexico either, as many foreign plants were introduced and are already considered naturalised;

therefore, they may have their designations in our country43. Another desirable feature of the possible source is that it contains cultural information and traditional knowledge regarding plants, i.e., medical properties and usages, ritual purposes, and utilisation as food.

The following is a summary of the features of potential sources and a preliminary classification based on what they provide. At the end of each classification, we offer some examples of the corresponding kind of work. We will consider a source as ideal if it meets three requirements: a, b, and any from c to f:

Ideal source44

a. It provides both common designations in Spanish and binomial nomenclatures.

42 No nos es posible, como lexicógrafos, desentrañar todas las diferencias de nombre y de especie biológica, por lo cual el [DEM] sólo refleja lo que hemos encontrado en el [CEMC] y en las obras de consulta… Available on https://dem.colmex.mx/Contenido/22

43 For instance, arnica (Arnica montana), a well-known European plant spread in America whose most common designation in Spanish is árnica (DEM, DLE), but in Mexico, among others, it is also called estornudera and veneno de leopardo (León Jiménez, 2005, p.24).

44 We use ideal and useful source to keep the terminological distinction between primary, secondary, and tertiary sources, namely 1) corpora of written or oral texts, 2) lexicographical works, and 3) linguistic materials such as lexicological research or grammars (Gouws & Prinsloo, 2005, p.16). In this conceptualisation, ideal and useful sources would belong to tertiary sources.

48

b. It supplies a morphological description of the species.

c. It contains both cultural information and traditional knowledge regarding plants.

d. It includes dialectal data of the designation.

e. It provides information regarding the region where the species grows.

f. It supplies images or pictures of the plant in question.

Examples: Catálogo de nombres vulgares y científicos de plantas mexicanas (Catalogue of Common and Scientific Designations of Mexican Plants); Malezas de México (Mexican Weeds), and Biblioteca digital de la medicina tradicional mexicana45 (Digital Library of Traditional Mexican Medicine).

Useful source

g. It contains either Spanish common designations or binomial nomenclatures.

h. It supplies either common designations in any indigenous Mexican language or binomial nomenclatures.

i. It cross-refers a given designation to another more spread designation, which, however, is not part of the macrostructure46.

j. It lacks any feature from c to e.

Examples: Ticus: Diccionario de colimotismos (Ticus: Dictionary of words used in Colima).

Lemma

In this section, we deal with the features of the lemma in the intended work. To be clear, a given form must be a noun used in any dialect of Mexican Spanish that designates one or more plants in order to be part of the word-list of the planned dictionary. To some extent, the following is a rough compilation of what we have found in DEM and what we expect to find in other sources in the future:

 Simple lexeme

o Regular simple lexeme, such as henequén ‘sisal’, mariguana ‘marijuana’, quelite ‘potherb’…

o Lexicalized diminutive or augmentative forms, for instance, calabac-ita < calabaza47, gallard-ete < gallardo, and canel-ón < canela, respectively.

45 This platform contains four works, namely Diccionario enciclopédico de la medicina tradicional Mexicana (Encyclopaedic Dictionary of Mexican Traditional Medicine), La medicina tradicional de los pueblos indígenas de México (Traditional Medicine of the Native Peoples of Mexico), Atlas de las plantas de la medicina tradicional mexicana (Atlas of Plants of Mexican Traditional Medicine), and Flora medicinal indígena de México (Medicinal Native Flora of Mexico).

46 As we already illustrated how a given form could designate different species, we contemplate some works, lexicographic or not, that provide neither the definition of the main designation nor the scientific name of the specific plant this more spread form refers to. See also footnote 50 in this chapter.

49

o Lexicalized plural forms, for example, chisme-s < chisme o Lexicalized diminutive and plural forms, e.g., jarr-ito-s < jarro

o Lexicalized forms from proper names, such as dondiego < don Diego (Mr James), ramón <

Ramón (Raymond)

 Multi-word lexeme

o Regular multi-word lexeme, such as palo loco, oreja de sapo, nanche de la costa...

o Lexicalized diminutive or augmentative48 forms, for instance, romer-ito cimarrón < *romero cimarrón

o Lexicalized plural forms, e.g., tripa-s de ratón < *tripa de ratón o Lexicalized diminutive and plural forms49

o Lexicalized forms with a proper noun, for example, flor de San José (Saint Joseph’s flower) or gobernadora de Puebla (governor of Puebla50)

3.3.4.2 Method of Lemma Abstraction

This section is about the method for abstracting lemma from DEM. We describe the procedure not just for obtaining lemmata but also for identifying to which kind of living being they refer to, based on the data provided by DEM itself.

Abstraction Method for Building the Current Basis

As mentioned, we describe the process for building the basis of the intended dictionary51. The first starting point was a database (DB1) from DEM52, whose data consisted of living beings’ designations and their corresponding binomial nomenclature(s), but neither a definition text nor an indication of the species that they refer to. Designations registered in DB1 correspond to the lemma (constituted by a simple lexeme) of the respective dictionary article where one or more scientific names are included. In

47 We consider this case reveals that it is important to treat these forms as independent lemmata. Since, on the one hand, calabacita designates two different species, namely Cucurbita pepo and Cucurbita moschata, on the other hand, calabaza is the designation of Cucurbita maxima.

These and the following examples were retrieved from DEM, and we see them as a sample of the phenomenon of plant designation. Based on it, we expect to find similar forms in other sources. However, we do not see this sample as a representative of this kind of designation, but useful to foresee what kind of lexemes we can find and make decisions in this regard.

48 In contrast to simple lexemes, we have not found any case of lexicalized augmentative multi-word lexemes, but we think it is likely to find some in other sources.

49 So far, we have not found any form with these features, but if applicable, we consider treating it as a lemma.

50 The name of a state in central Mexico.

51 We built this database mainly during an internship at the Institute for Galician Language (Instituto da Língua Galega).

52 We are very grateful to Erik Franco, one of DEM’s editors, for providing us with this database.

50

DB1, the lemma repeats itself as many times as binary nomenclatures the corresponding article contains (Figure 4 shows this lemma repetition). From it, at least two superficial conclusions can be made: 1) only simple lexemes designate plants, and 2) in DB1, it is not possible to know which binominal nomenclatures correspond to different species and which are synonyms.

Figure 4 DB1

Note. From Franco (2022).

Specific note. The arrows in the figure are ours.

Thus, we have, for example, the lemma aguja and its corresponding binomial nomenclatures (see Figure 4 above): Erodium cicutarum, Samyda yucatanensis, and Opuntia molesta. These data are insufficient by themselves to achieve the genuine purpose of the intended dictionary. Therefore, we complemented BD1 through a manual identification (MI) of the corresponding referent (either animal or vegetable) of both the lemma and the binary nomenclature(s) based on DEM’s definition texts. For instance, aguja refers to three different plants, namely an herbaceous one (Erodium cicutarum), an endemic tree in Yucatan (Samyda yucatanensis), and a kind of cactus (Opuntia molesta).

In addition, the MI enabled us to identify synonyms between binomial nomenclatures, orthographic variations among common designations, and complex lexemes as designations of different species. The following are the MI results53:

53 The figures above correspond just to plants. We recall that the very first database (DB1) includes just lemmata and the corresponding binary nomenclature without data regarding the type of species in question. It follows from it that we also identified designations of other living beings, such as animals or fungi. Although our data set contains this kind of information, we do not bring it up here since our project focuses on plant designations.

Nevertheless, we think that this database could be helpful for projects related to other living beings.

51

 1271 binary nomenclatures o 28 synonymic scientific names

 725 lemmata

o 436 simple lexemes o 45 orthographic variants o 244 multi-word lexemes54.

Through this procedure, we also found some lemmata that are not included in DB1 but are in DEM, e.g., junco and madroño, and their corresponding binary nomenclatures Koeberlinia spinosa, Arbutus arizonica, respectively55.

Figure 5 Basis for PITAYA

Note. From Huerta Meza (2022b).

Documentos relacionados