Call “Bridging Gaps – A Call for Expressions of Interest to Connect CLARIN to External Language Technology Tools” 2022
Project duration: 1.6.2022-30.9.2022.
Project leader: Olja Perišić, Università degli Studi di Torino, Dipartimento di Lingue e Letterature Straniere e Culture Moderne, Italy
Project team includes also members of Jerteh – Society for Language Resources and Tools, Serbia Duško Vitas, Ranka Stanković, Milica Ikonić Nešić Contributors also: Cvetana Krstev, Saša Moderc, Mihailo Škorić.
Main goal: development of the CLARIN compatible NER web service for parallel text with case study on Italian and Serbian, dubbed It-Sr-NER. Service could be used for recognizing and classifying named entities in bilingual natural language texts. Input would be parallel texts expected to be TMX (Translation Memory eXchange) file, e.g. Sr-It. It-Sr-NER would recognize six NER classes: demonyms (DEMO), works of art (WORK), person names (PERS), places (LOC), events (EVENT) and organisations (ORG). Although primarily developed for aligned, parallel texts in TMX, the use of the service for monolingual text NER annotation for available spaCy NER models will be possible. It-Sr-NER uses a powerful Convolutional Neural Network architecture within the spaCy tool.
Github page: https://github.com/rankastankovic/It-Sr-NER/
Project resources: https://github.com/rankastankovic/It-Sr-NER/tree/main/corpus
It-Sr-NER Web application and service: http://ners.jerteh.rs/
Corpora in the ILC4CLARIN repository
App It-Sr-NER in repository ILC4CLARIN
Project presentation at the JeRTeh seminar – 20.10.2022 “It-Sr-NER: CLARIN compatible NER and geoparsing web services for parallel texts: case study Italian and Serbian” – Olja Perišić, Università degli Studi di Torino and JeRTeh team [pdf]