ISO-25964-1 Guidelines for thesaurus management software and how PoolParty maps to them
ISO-25964-1 Guidelines for thesaurus management software and how PoolParty maps to them
The ISO/DIS standard 25964-1“Information and documentation — Thesauri and interoperability with other vocabularies. Part 1: Thesauri for information retrieval“ defines “Guidelines for thesaurus management software” in this chapter we map them to the capabilities of the PoolParty Thesaurus Server (PPT) to provide a good overview of how good PPT matches them.
In the general notes ISO-25964 states that these guidelines do not include general evaluation criteria for software like good documentation, training and support, the general user-friendliness of the interface and an acceptable price. The following links provide input on those general criteria:
Documentation is what you read currently but of course you can start from the beginning
Find information on trainings and a pointer to our planned e-learning project on our homepage
Find our product matrix and prices on our homepage
Below you we will now evaluate the detailed guidelines of ISO-25964 and their coverage by PoolParty's functionality.
Provide size and character limitations of labels/terms/concepts
The software should not impose limitations as follows.
1. There should be no limitations on the number of terms in the vocabulary, which prevent it expanding to the size needed. The same applies to other elements such as node labels, scope notes, etc.
PoolParty applies no limitations on number of concepts and by that terms (labels) linked to those concepts and additional descriptive data like scope notes.
See: Working With Concept Schemes and Concepts
2. There should preferably be no limitations on the length of terms or of node labels, notes, etc. While relatively few terms exceed 40 characters, sometimes they need 100 characters or more.
PoolParty applies no limitations on the length of terms (labels), node labels or descriptive elements like notes.
See: Editing Details
3. There should be no limitation on the number of hierarchical levels admissible, or the number of hierarchical, associative or equivalence relationships possessed by any preferred term.
PoolParty applies no limitation on the number of hierarchical levels or hierarchical, associative relationships between concepts or equivalence relationships between terms/labels.
See:
4. The software should be capable of handling all the characters in the Universal Character Set as defined in ISO/IEC 10646, when used in any of the text elements (including terms, node labels and notes).
PoolParty uses UTF-8 for encoding throughout the system. All literals (including terms, node labels and notes) crated can include all characters availabel in the Universal Character Set.
5. The editor should be able to choose upper or lower case characters as appropriate.
PoolParty allows to use upper and lower case characters as appropriate.
6. For handling multilingual thesauri, there should be no limit on the number of languages that can be included, and it should be possible to apply all the desired languages to any or all of the text elements, such as terms, node labels and notes.
PPT applies no limit on the number of languages defined per project (thesaurus/taxonomy). All selected languages are available for all language tagged literals such as terms, node labels and notes.
Allow to create relationships between terms and between concepts
Note
The abbreviations BT, NT, RT, USE, UF, SN used below stand for broader term, narrower term, related term, use, use for and scope note and have already been used in the previous standard as symbols abbreviations especially for providing thesaurus record examples in a textual content.
1. It should not permit the coexistence of duplicate terms for the same language. Any duplicates entered should be rejected upon entry or at least submitted to the editor for correction, amalgamation, the addition of a qualifier, or other remedial action. The matching algorithm for duplicate detection should be capable of customization so that, for example, typographical differences such as italics or capitalization may be ignored for the purposes of duplicate detection.
PoolParty is based on SKOS taking a concept based approach where concepts are related hierarchical (BT/NT) and associative (RT) and concepts are labeled with preferred Labels and alternative Labels and hidden labels replacing the equivalence relationship defined in a term based approach (USE/UF). In addition it allows defining quality criteria taking into account rules like: “It should not permit the coexistence of duplicate terms for the same language.” If the optional SKOS-XL module is used labels attached to concepts can be turned into individuals (SKOX-XL labels) related to concepts by that allowing more flexibility (custom relations to concepts, relations between labels/terms, metadata for labels/terms) but of course introducing additional complexity in maintaining and integrating thesauri.
See: Quality Management in PoolParty
2. It should support the basic relationships BT/NT, RT/RT, USE/UF
PoolParty supports all basic relationships as they can be directly mapped to SKOS attributes and relations. So BT/NT is equivalent to skos:broader/skos:narrower, RT/RT to skos:related and USE/UF is replaced by skos:prefLabel, skos:altLabel and skos:hiddenLabel as outlined above.
3. It should support reciprocity requirements i.e. if Concept A has a BT relationship with Concept B, then Concept B should have an NT relationship with Concept A and vice versa; if Concept C has an RT relationship with Concept D, then Concept D should have an RT relationship with Concept C; if Term E has a USE relationship with Term F, then Term F should have a UF relationship with Term E and vice versa. Preferably, the software should make the reciprocal relationships available automatically in response to the editor's insertion one way round, but as a minimum the software should issue a warning if any non-reciprocal relationship is present.
PoolParty per default follows the described reciprocity requirements creating the respective relationships automatically. Again handling of USE/UF handled per default different following a concept based approach.
4. When a term or concept is amended or deleted, the change should be propagated automatically in all places in which that term or concept appears related to another (whether as BT, NT, RT, USE or UF). In the event of term or concept deletion, all relationships to and from that term or concept should be deleted. However, if deletion of a term or concept or relationship would leave any concept without at least one broader term/concept, or any non-preferred term without a preferred term, the editor should be warned.
Deleting a concept in PoolParty all changes are propagated automatically to all related concepts. Creating a concept with no broader concept by deletion is prevented by the system.
5. There should be no limitation on the number of relationships a given preferred term or concept may have with others. Thus one concept may have any number of BTs, NTs and RTs, and one preferred term may have any number of non-preferred terms.
PoolParty applies no limits on the number of relationships generated. Via it’s quality management mechanism issues like circularities, applying an associative and a hierarchical relationship between the same concepts can be prevented.
6. It should be possible to set up user-defined reciprocal relationships, for example to distinguish between different types of BT/NT or different types of associative relationship.
The PoolParty custom scheme and ontology feature allows creation of user-defined relationships.
See: Ontology Management
7. Validation checks should prevent the entry of inadmissible relationship combinations. If two terms or concepts already have one of the basic relationships, no other basic relationship between the same terms or concepts is admissible. If Concept A has BT Concept B, none of the concepts in the BT hierarchy above Concept B should be admissible as BT, NT or RT of Concept A. Non-preferred terms (i.e. any term with a USE or USE+ relationship to another term) may not have any BT, NT, RT or UF relationships. In the case of the USE+ relationship occurring, the software should check that the relationship is at least ternary.
PoolParty's quality management mechanism provides different validation checks for quality issues like circularities, applying an associative and a hierarchical relationship between the same concepts etc. Validation can be enforced on entering data, reported via a quality reports or ignored depending on settings that can be defined per project.
8. No relationship from a term or a concept to itself allowed.
PoolParty does not allow the generation of relations of concepts or SKOS-XL labels to itself.
9. Only one preferred term should be admissible for each concept, in each language of the thesaurus.
PoolParty following SKOS only allows one preferred label/term per language for a concept.
Allow applying notes to terms or concepts
1. It should support entry of a scope note, associated with any concept.
PoolParty supports the creation of scope notes (skos:scopeNote).
2. If a note (of any type) makes reference to another term or concept in the vocabulary, the software should preferably support addition of a marker or hyperlink to the record for that term or concept. The software should check the validity of the link target.
PoolParty allows to include HTML tags in notes that way links to other concepts can be generated.
Note
Automated linking to other concepts or checking the validity of links is not supported yet.
3. It should support the setting up of user-defined notes associated with any term or concept, for example history notes, editorial notes etc.
PoolParty is based on the SKOS data model and by that provides skos:editorialNotes, skos:changeNotes and skos:historyNotes out of the box. In addition custom notes can be defined via the custom schema/ontology feature.
Codes and notation
1. It should be possible to associate at least one code, number or other type of notation with any term or concept, concept group or array. Preferably more than one coding/numbering type should be supported.
PoolParty is based on the SKOS data model and by that provides skos:notations to specify one or many codes or notations to concepts. In addition custom code and notation properties can be defined via the custom schema/ontology feature and be associated with concept schemes (concept groups).
Note
Currently definition of custom properties for collections (concept arrays) is not supported.
2. It should be possible to associate one or more subject category/ies with any term or concept, concept group or array.
PoolParty allows to define custom classes and custom relations. That way concepts can be associate subject categories defined as custom classes or concepts.
See: Ontology Management
3. It should be possible to assign a unique identifier to each term and to each concept. Preferably, the assignment of the identifier(s) should be automatic whenever a new term and/or concept is entered, in such a way as never to duplicate any of the existing identifiers or identifiers of terms or concepts previously deleted. The identifier should not change when term or concept attributes or relationships are modified in any way.
Following semantic web and linked data principles every concept, concept group (basically any resource) created in PoolParty has a unique identifier which is an http URI. The URIs are generated automatically when the resource is generated. The URI pattern can be defined in a very flexible way and the generation of identifiers can be based on different mechanism (incremental, UUID, from preferred label). Of course identifieres do not change when a concept is edited or modified the applications takes care that identifiers are not duplicated.
Per default the plain SKOS data model is used where “terms” are labels (language tagged literals) defined as attributes of concepts and by that do not have their own identifiers. With the PoolParty SKOS-XL module terms/labels become resources who have their own identifiers and can be related to concepts.
see: SKOS-XL with PoolParty - Overview
4. It should be possible to output the vocabulary using the sequence of each type of notation, coding or identifier.
Via PoolParty's custom reports functionality any output based on sequence of notation, coding or identifier can be created.
Node labels
1. Node labels are not regarded as thesaurus terms or concepts and, therefore, are not subject to the relationship requirements defined for concept. Furthermore, they do not need to be unique and so should not be subject to the duplicate control.
Since SKOS does not foresee node labels as a part of the recommendation there is no basic representation vor them available in PoolParty. However there are two functionalities available that allow to provide similar functionality.
You can Blacklist concepts. That will exclude them from Entity Extraction and also provide that information in the data so those concepts can be treated accordingly in integrations (as node labels). In addition you can create a custom class "Node Label" with PoolParty's Custom Scheme & Ontology functionality and assign it to the respective concepts to mark them accordingly.
See: Blacklist Concepts and Terms
2. The software should have means for locating a node label in displays in the correct position relative to any parent term and to the highest level terms that come within the corresponding facet or array.
The hierarchical tree view allows to locate concepts used as node label (see above).
Status of languages
The software should allow each language of a multilingual thesaurus to have equal status, avoiding predominance of one of the languages over the others.
Basically languages in PoolParty are treated equally with one exception.
Note
A default language is defined for each project on project setup and the preferred term has to be specified for this language when creating a concept or concept group.
For example:
1. The number of non-preferred terms applying to any one concept in each language should not be determined by the number present in another language, and the non-preferred terms in different languages should not be required to correspond to each other.
This condition is fulfilled in PoolParty.
2. The existence of a scope note for a given concept in one language should not require a corresponding scope note in any of the other languages.
This condition is fulfilled in PoolParty.
Data import/export
1. Bulk import of datasets from existing vocabularies, comprising terms, scope notes node labels, standard relationships between the terms and concepts, and all other attributes of the terms, concepts and node labels (All of the mentioned features should be retained after import, as well as all characters from the Universal Character Set wherever used)
PoolParty's RDF import/export functionality allow to bulk import/export whole thesauri preserving all mentioned features and also the correct character set. Also a "Project Export" function is available allowing to duplicate a whole thesaurus project or move it to a different server.
See: PoolParty RDF Import & Export
2. Producing reports or exporting the vocabulary, including all terms, scope notes, notation and standardelationships between the terms and concepts. (It should also be possible to export editor-defined subsets, such as non-preferred terms only, or preferred terms with their scope notes and NTs only, or preferred terms of concepts that have no hierarchical or associative relationships, etc.)
PoolParty's custom reports functionality allows to produce exports of the whole vocabulary or subsets in any format needed.
See: Generate Reports with PoolParty
3. Batch edit/delete facility. (It should be possible to edit batches of records in the same way, preferably using a facility native to the software, but if this is not possible an option may be to use functions of the underlying database management system, or to export the selected records, edit them externally and re-import. This technique may be used, for example, to add the same history note or relationship to a batch of terms, or perhaps a mapping to terms in another vocabulary. Caution is needed, however, to apply the customary validations on import)
PoolParty allows to delete concepts and whole subtrees of the hierarchy. In addition subtrees/subsets can be exported and import via the built in export import functionality in RDF and EXCEL. An addition the APIs can be used to add data to concepts in batch mode.
See: Thesaurus & Ontology Manager API
4. Exporting all terms that have been amended after a given date, with or without their full details. (The option should be available of selecting only certain types of amendment, for example only new terms, or only those in a particular language, or of including all terms that have undergone any change of attributes or relationships. It should also be possible to report all terms that have been deleted from a certain date.)
Any custom export based on different criteria like only new terms by creation date, terms in a particular language, concepts with changes to specific attributes can be done via the custom reports feature.
5. Outputting thesaurus displays, either in hard copy or on the screen. (It should be possible to choose between a variety of sequences and layouts for the display, of which the alphabetical display is essential and either a hierarchical or a classified display highly recommended.)
PoolParty offers a hierarchical display and a alphabetic display.
Editorial safeguards
1. Editorial changes should be made in the first instance to a master database from which outputs are derived periodically for downstream processes such as indexing or resource discovery applications
PoolParty uses a built in graph data base. Integrations in indexing and resource discovery applications can be done via it's APIs or import/export mechanism. Of course also staging mechanism can be supported based on the required usage scenario.
See: Example for a Thesaurus Curation Workflow
2. If more than one person is editing the master database simultaneously, an inbuilt mechanism should prevent simultaneous write access to the same records
PoolParty handles simultaneous write access to the same records based on a "last in wins" mechanism. All change scan be tracked via the history available for each record.
See: PoolParty History
3. Security/password controls should prevent editorial changes from unauthorized persons
PoolParty is a Java based application using Spring Security for authentication. Spring Security allows integration with all common authentication mechanism used in Enterprises (e.g. AD, LDAP, Sitminder etc.). Per default a built in authentication mechanism is used.
See: PoolParty User Administration Setup Using Keycloak
4. Preferably, the software should allow for different levels of access, so that provisional changes can be made, which are not finally admitted until approved
PoolParty's Workflow functionality provides a simple workflow mechanism allowing to setup an approval process on concept and term level.
5. For each editor, a roll-back (undo) function should allow progressive reversal of the most recent editorial changes he/she has entered
A roll-back of changes can be done manually based on the history.
Note
Currently no automated roll-back mechanism is provided.
6. A log should be kept so that the database can be restored from an earlier version
PoolParty's History provides a log of all changes done. In addition a snapshot mechanism is provided that allows to create backups of versions that can be restored any time.
Housekeeping tools
1) It should be possible to obtain reports of the numbers of terms with particular characteristics, particularly the total number of preferred and non-preferred terms.
PoolParty`s default statistics show the main characteristics of a project (e.g. number of preferred and non-preferred terms, number of relations ...). In addition via the custom reports functionality additional reports can be created easily.
See: Generate Reports with PoolParty
2) The frequency of use of thesaurus terms in indexing and searching should be monitored, and the thesaurus management software should be capable of importing and storing the usage data, if another tool does not already perform this function.
Via PoolParty's Custom Scheme and Ontology management additional fields can be generated that allow to store usage information for concepts/terms. An integration with indexing and search tools can be done via PoolParty's APIs.
See: Developer Guide