User Tools

Site Tools


damsym:semantic_search_engine

This is an old revision of the document!


Semantic Search Engine

General Description

The semantic search module represents the core functionality of the DaMSym tool.

It is designed to allow users to query multilingual textual corpora — Arabic, Church Slavonic, Greek, Latin, Sanskrit, and Latin & Greek — through an intelligent system based on conceptual similarity between query terms and the content present in the texts.

Unlike traditional search engines, which operate on literal matches, DaMSym processes the inserted terms by evaluating semantic proximity between concepts.

The result is a list of texts correlated not only by keyword, but also by meaning and linguistic context, thereby supporting comparative analysis and philological study.

Semantic search is available to all users, including unauthenticated users (Guest), while some advanced functions — such as adding resources, reviewing, or approving — require authenticated access.

Search


Interface and Structure

The search engine interface is organized in a simple yet functional way.

All input tools are placed on the left side of the screen, arranged vertically and including:

  • the main search bar,
  • the “Advanced Search” section,
  • the filters panel.

The right side of the page is dedicated to displaying search results, which are updated in real time according to the selected settings.

The visual separation between input area and output area ensures clarity and immediacy, allowing users to modify parameters without reloading the page.

All search parameters are interdependent: any modification to filters, query terms, or concept weights dynamically influences the semantic context and the list of displayed results.


Supported Languages and Search Domains

The system supports six main linguistic domains, each characterized by its own internal rules and metadata structure.

Despite internal differences, Arabic, Greek, Latin, and Latin & Greek share the same filter structure, while Church Slavonic and Sanskrit present different filter configurations.

Language / Domain Main Characteristics
Arabic, Greek, Latin, Latin & Greek Common filter structure: Authors, Works, and Period. In the case of the Latin & Greek combination, the underlying search model simultaneously queries the Greek and Latin datasets. Consequently, the available metadata and filter structure are identical for both languages, and the returned results include texts from both corpora.
Church Slavonic Includes a dedicated font selection dropdown to ensure correct character rendering. Provides filters for Language and Historical/Regional Variant. Does not include management of authors or works.
Sanskrit Includes only the Works filter. No additional parameters are provided. Automatic transliteration is available to improve text readability.

The Advanced Search functionality in DaMSym allows users to refine the precision of semantic search by combining the mandatory main query with one or two optional additional search phrases.

The main search always represents the core of the search process.

Additional phrases serve exclusively to refine, contextualize, or modulate the results, without ever replacing or outweighing the main search.

Each search block contributes to the final result through a numerical weight ranging from 0 to 1, representing its relative importance compared to the other active searches.

When the user activates the Advanced Search section:

  • The main search remains mandatory and can be assigned a minimum weight of 0.5 up to a maximum of 0.9.
  • A first additional search phrase becomes available, with a default weight of 0.5.

These values represent an initial suggestion and can be modified by the user.

Advanced Search

The user may choose to add a second additional phrase by clicking the “+” button.

In this case, a new section becomes available, and the user can assign three distinct weights to the three search terms or phrases, starting from the default values proposed by the system.

Added section in Advanced Search

If both additional phrases are active, clicking the “–” button allows the user to remove the most recently added one.

The sum of the assigned weights must always be equal to 1. The system automatically recalculates values to maintain the semantic balance of the overall search configuration.

The semantic engine balances the search according to these proportions, returning results consistent with the indicated conceptual combination.

This weighting logic makes the search more flexible and suitable for comparative or multidisciplinary studies.

The weight assigned to each search phrase can be adjusted using a slider or through an input field, which provides arrow controls for changing the value.


Filters and Search Parameters

In addition to semantic terms, users can narrow the scope of the search through a series of contextual filters, which vary depending on the selected language.

Common Filters (Arabic, Greek, Latin, Latin & Greek)

  • Chronological range (Period) → available for all languages except Church Slavonic and Sanskrit. The filter is dynamic and automatically adapts to the queried dataset. The range can be selected through a time slider or by manually entering values in the “From year” and “To year” fields.
  • Authors → includes a search bar for selecting and filtering by one or more authors.
  • Works → includes a search bar for selecting and filtering by one or more works.

Church Slavonic Filters

  • Font → selection of the character type used for text rendering.
  • Language → filter allowing selection of one or more available languages, including individual selection.
  • Historical/Regional Variants → geographic or cultural contextual reference.

Sanskrit Filter

  • Works → multiple selection of specific works or collections.

There is synchronization between the Authors and Works filters: when selecting an author, the Works filter is automatically updated to display only the works associated with that author.

Consequently, modifying the selected author dynamically updates the list of available works in the Works filter.

A Reset button is also available, allowing users to completely clear the search, including selected filters and text entered in the main search bar.


Search Results

Search results are displayed as a list on the right side of the screen.

Each entry includes:

  • the title of the work or fragment;
  • an excerpt of the corresponding text;
  • the Similarity Score, a numerical value expressing the degree of semantic similarity between the result and the user’s query;
  • the “More Details” button, which opens the detailed metadata view.

Results can be sorted according to three criteria:

  • Similarity (default)
  • Date
  • Author

In the results page, no direct text highlighting is applied for any of the supported languages.

By selecting the “More Details” button, the user accesses the detailed metadata view, where a contextualized portion of text is displayed — specifically, the segment semantically closest to the executed search query.


Detailed View (“More Details”)

By clicking “More Details” next to a result, the user accesses a detailed view containing all information associated with the selected text:

  • title;
  • full or extended text of the fragment;
  • list of metadata (work, period, place, source, etc.), varying according to the selected language;
  • semantic highlighting for all languages.

From this same view, authenticated users (Researcher, Reviewer, and WP Lead) may propose corrections directly on the highlighted text, modify metadata using the Edit button (which appears next to each metadata field when hovering over it), and add new metadata using the Add Metadata button located on the right, immediately below the metadata list.


Text Editing and Corrections

Within the More Details view, users with the roles of Researcher, Reviewer, and WP Lead can interact with the displayed text and metadata according to controlled procedures.

In particular:

  • text modification is allowed exclusively on contextualized portions automatically highlighted by the system according to the search query. These portions can be selected to propose semantic or philological corrections using the Edit Text (Corrections) function;
  • metadata are structured as name–value pairs and can be modified using the Edit buttons displayed next to each field;
  • through the Add Metadata button, users may propose the addition of new informational fields (for example, sources, original titles, notes, or bibliographic references).

All modifications are not applied directly but are saved as correction proposals.

Proposals are visible in the Corrections section of the Dashboard, where:

  • Researcher and Reviewer users can modify, delete, submit, or view the correction proposal;
  • the WP Lead can evaluate each proposal and approve, reject, or delete it using the dedicated management buttons.

Interdependence Between Search and Filters

Search and filters in DaMSym do not operate independently but in a relationship of dependency: any modification to parameters influences the semantic processing of the query.

For example, selecting a specific author automatically restricts the semantic context to the subset of texts associated with that author.

This dynamic architecture ensures a fluid, coherent, and scientifically accurate search experience, suitable for comparative studies and high-level linguistic analysis.


damsym/semantic_search_engine.1770895306.txt.gz · Last modified: by fincons