VarDial Evaluation Campaign on Similar Languages, Varieties and Dialects, Call for Participation
- Invitiation:
Within the scope of the VarDial workshop, co-located with EACL 2017, we are organising an evaluation campaign on similar languages, varieties and dialects with multiple tasks.
URL: http://ttg.uni-saarland.de/vardial2017/sharedtask2017.html
We are offering four tasks this year:
– (DSL) Discriminating between Similar Languages
Fourth iteration of the DSL task featuring a multilingual dataset containing excerpts of journalistic texts. Languages included this year grouped by similarity are: Bosnian, Croatian, and Serbian, Malay and Indonesian, Persian and Dari, Canadian and Hexagonal French, Brazilian and European Portuguese, Argentine, Peninsular, and Peruvian Spanish.
– (ADI) Arabic Dialect Identification
Second iteration of the task included in the DSL 2016. This year we will be releasing acoustic data along with speech transcripts for the following Arabic dialects: Egyptian, Gulf, Levantine, and North-African, and Modern Standard Arabic (MSA)
– (GDI) German Dialect Identification
In addition to Arabic dialects, we propose an analogous task on the identification of four Swiss German dialect areas: Basel, Bern, Lucerne, Zurich. We will provide manually annotated speech transcripts for all dialect areas.
– (CLP) Cross-lingual Dependency Parsing
The task is to develop models for parsing selected target languages without annotated training data in that language but annotated data in one or two closely related languages. We will include the following language pairs:
Target language = Croatian, Source language = Slovenian
Target language = Slovak, Source language = Czech
Target language = Norwegian, Source languages = Danish and Swedish
We will be releasing the training data on Tuesday (December 27) and the test sets on January 25, 2017.
To participate please fill the registration form available at the workshop website.