Multilingual Named Entity Recognition and Classification – traineeship position
Са задовољством Вам прослеђујемо следећи позив:
The Text and Data Mining Unit of the European Commission’s Joint Research Centre (JRC) is looking to fill one traineeship position in the field of:
Multilingual Named Entity Recognition and Classification.
If you are interested, please follow the instructions provided at http://recruitment.jrc.ec.europa.eu/?type=TR&site=IPR (Code: 2017-IPR-I-000-8664).
Job description: http://recruitment.jrc.ec.europa.eu/showprj.php?type=T&id=5087
Traineeship rules: https://ec.europa.eu/jrc/sites/jrcsh/files/jrc_trainee_rules_en.pdf
Starting date: As soon as possible
Duration: 5 months
Allowance: Up to approximately 1000 Euro per month.
Text and Data Mining Unit : https://ec.europa.eu/jrc/en/text-mining-and-analysis
JRC-EMM products: http://emm.newsbrief.eu/overview.html
JRC-EMM Publications: http://optima.jrc.it/Resources/JRC-EMM_Publications.pdf
DESCRIPTION OF THE FORESEEN ACTIVITY:
The JRC’s Europe Media Monitor (EMM) team carries out research and development in the field of highly multilingual text mining (Language Technology; Computational Linguistics) for the purposes of media monitoring. EMM gathers an average of 300,000 online news articles per day in over 70 languages and analyses them to help its large international user community understand and use this enormous amount of media information. The Europe Media Monitor EMM is publicly accessible and widely used. The EMM team has produced over 200 international peer-reviewed publications. The team has also produced and distributes a number of highly multilingual Language Technology resources.
The Text and Data Mining Unit (I3) of the European Commission’s Joint Research Centre (JRC) in Ispra, Italy, is looking for a trainee to support the JRC’s Europe Media Monitor (EMM) team in its effort to improve its Named Entity Recognition and Classification (NERC) tools, especially for multi-word entities such as organisation and event names. EMM gathers and analyses reports from traditional and social media in dozens of languages by clustering related news items; categorising them; extracting information such as entities (persons, organisations, locations), events (who did what to whom, where and when), quotations by and about people; identifying sentiment; as well as linking related news clusters over time and across languages. Methods used are mostly hybrid: machine learning tools are used to gather evidence, learn vocabulary and rules, but the results are usually controlled and optimised through human intervention. EMM is used by European Institutions, by national authorities in EU Member States, by international organisations and by the public. The public EMM applications NewsBrief, NewsExplorer and MedISyscan be accessed freely by the general public. EMM is part of the JRC’s Competence Centre on Text Mining and Analysis.
As of now, the EMM team has accumulated several very large independent sets of multi-word entities and their monolingual and multilingual name variants. Some of the entities are classified according to an entity type hierarchy, while others are not. The successful trainee will help to improve the current tools to recognise multiword entities, classify entities, merge the various lists of entities and their variants into one single repository, and integrate the NERC tools with the EMM processing chain. The trainee is also expected to contribute to writing a scientific publication on the work carried out.
· a degree (or an almost completed degree) in computational linguistics, computer science or related areas;(Applications from students currently preparing a thesis for a University degree are eligible. The thesis should match with the subject of the project call).
· Java programming skills;
· good working knowledge of English. (B2 level)
· knowledge of further foreign languages;
· proven advanced programming skills, especially in Java;
· good knowledge of Language Technology related tools and methods;
· proven ability to work independently and as part of a team.
Text and Data Mining Unit
Joint Research Centre