User Tools

Site Tools


ubiquity:search_and_corpora

This is an old revision of the document!


Search and Corpora

General Description

The ITSERR – uBIQUITY platform allows users to perform textual searches on three main linguistic corpora:

  • Arabic
  • Greek
  • Latin

Each corpus consists of a set of digitized and indexed texts, selected for their philological and scientific relevance. All three corpora are fully operational and accessible through a single centralized search interface.

The main objective of the search module is to allow the researcher to:

  • perform targeted textual analyses on specific terms or phrases;
  • compare results across multiple languages or authors;
  • filter and organize results by author, source, or period;
  • save searches for later reuse (through flows).

Linguistic Corpus Selection

Within the search section, the user can select the reference language:

  • Arabic
  • Greek
  • Latin

The language selection automatically determines the corpus and the indexing parameters used for the search. For example:

  • in the Latin corpus, results are extracted from classical Latin texts;
  • in the Greek corpus, the search is performed on ancient Greek texts;
  • in the Arabic corpus, religious, philosophical, or literary texts in Arabic are explored.

Each corpus maintains its own internal structure, but the search engine still allows a uniform and cross-corpus query.


Search Term Input

The main search field (Search Box) accepts:

  • single words;
  • phrases or extended portions of text;
  • combinations of terms separated by spaces or logical operators.

Example:

terra autem erat

Once the text is entered, the system displays a preview of the search parameters and allows the user to:

  • select the desired corpus;
  • set any additional filters;
  • start the search using the “Search” button.

The search engine uses a semantic indexing system that allows matches to be identified even in the presence of orthographic variants or inflections. (Figure 2)

Figure 2, Search Results


Filters and Comparison Criteria

The search module provides a set of advanced filters, which differ depending on the selected corpus (Arabic or Greek/Latin). These filters allow the search to be refined and more targeted results to be obtained.


Available Filters for the Arabic Corpus

Filter Description
Comparison Criteria Defines the comparison mode between texts (Exact, Inflections, Roots, Synonyms, Structures).
Sources and Authors Allows the selection of the Qurʾān and its chapters, as well as the works of Ḥadīth, Sīrah, Tafsīr, with further filtering by Author name and Books.
Geography Filters the search based on the reference geographical area.
Chronology Filters the search based on the historical or chronological period.

Available Filters for the Greek and Latin Corpora

Filter Description
Comparison Criteria Defines the comparison mode between texts (Exact, Inflections, Roots, Synonyms, Structures).
Compare Allows the selection of the comparison type: To text or To text and apparatus.
Scriptures and Authors Allows filtering by Scriptures and their related Books, as well as by Ancient Authors, with filtering by Author name and Work title.
Geography Filters texts based on the associated geographical location.
Chronology Filters texts based on the historical or chronological period.

Comparison Criteria – Legend

The available comparison criteria, common to all corpora, are:

Exact This criterion returns results that exactly match the searched word, expression, or passage (if present in the corpus).

Inflections This criterion returns results that include all inflected forms of the searched word, such as different verb tenses or nominal cases.

Roots This criterion returns results containing terms that share the same root as the searched word (e.g. “dream / dreamer / dreaming”).

Synonyms This criterion returns results that include synonyms of the searched word or words.

Structures This criterion returns results that convey meanings similar to the searched text in terms of topics, ideas, or symbolisms, even in the absence of literal matches. (Note: this is the only criterion that generates the Similarity Score.)


Search Execution

Once the parameters have been defined, the user starts the search by clicking the “Search” button. During processing, a status indicator (loading) and a progress message are displayed. (Figure 2)

The system then returns:

  • a list of textual results, each with reference to the source and position in the text;
  • when applicable, the similarity score (Similarity Score), displayed only when the user selects the “Structures” option among the Comparison Criteria, i.e. in structure-based searches;
  • direct access to comparison and annotation functions.

Results are displayed in a compact tabular format, with the possibility of expanding rows to show larger excerpts or contextual details. (Figure 2)


Search Persistence

Each search can be saved as a workflow (Flow) for later reuse. (Figures 3–4)

The save operation stores:

  • the original query;
  • the selected language and filters;
  • the date and the user who performed the search.

This allows previous searches to be resumed at any time without the need to reconfigure parameters.

Flows are stored in the platform database and are uniquely associated with the user account.

Figure 3, Save Flow

Figure 4, History of Flows

ubiquity/search_and_corpora.1768464203.txt.gz · Last modified: by fincons