Program

The time zone for NoDaLiDa 2021 is Central European Summer Time (CEST). Information about how to access individual workshops and sessions will be presented in Trello.

Monday May 31, 2021

Time Workshop Access
09:00-17:00 Workshop on Modelling Translation: Translatology in the Digital Age Trello
09:00-17:00 NLP for Computer-Assisted Language Learning (NLP4CALL 2021) Trello
10:00-17:00 Sustainable language representations for a changing world Trello

Tuesday June 1, 2021

Time Session Access
09:00-09:15 Opening Trello
09:15-10:05 Keynote: Lucia Specia
Chair: Jörg Tiedemann
Trello
10:05-10:35 Coffee break  
10:35-12:15 Parallel sessions  
10:35-12:15 Session 1: Large-scale Language Models
Chair: Barbara Plank
Trello
10:35-11:00 WikiBERT Models: Deep Transfer Learning for Many Languages.
Sampo Pyysalo, Jenna Kanerva, Antti Virtanen and Filip Ginter.
 
11:00-11:25 EstBERT: A Pretrained Language-Specific BERT for Estonian.
Hasan Tanvir, Claudia Kittask, Sandra Eiche and Kairit Sirts.
 
11:25-11:50 Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model.
Per E Kummervold, Javier De la Rosa, Freddy Wetjen and Svein Arne Brygfjeld.
 
11:50-12:15 Large-Scale Contextualised Language Modelling for Norwegian.
Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid and Stephan Oepen.
 
10:35-12:15 Session 2: MT & Multilinguality
Chair: Yves Scherrer
Trello
10:35-11:00 Extremely low-resource machine translation for closely related languages.
Maali Tars, Andre Tättar and Mark Fišel.
 
11:00-11:25 Measuring Translationese across Levels of Expertise: Are Professionals more Surprising than Students?.
Yuri Bizzoni and Ekaterina Lapshinova-Koltunski.
 
11:25-11:50 CombAlign: a Tool for Obtaining High-Quality Word Alignments.
Steinþór Steingrímsson, Hrafn Loftsson and Andy Way.
 
11:50-12:15 Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks.
Prajit Dhar and Arianna Bisazza.
 
12:15-13:15 Lunch break  
13:15-14:55 Parallel sessions  
13:15-14:55 Session 3: Speech & Generation
Chair: Mika Hämälainen
Trello
13:15-13:40 Speaker Verification Experiments for Adults and Children using a shared embedding spaces.
Tuomas Kaseva, Hemant Kumar Kathania, Aku Rouhe and Mikko Kurimo.
 
13:40-14:05 Spectral modification for recognition of children’s speech undermismatched conditions.
Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Paavo Alku and Mikko Kurimo.
 
14:05-14:30 A Baseline Document Planning Method for Automated Journalism.
Leo Leppänen and Hannu Toivonen.
 
14:30-14:55 Assessing the Quality of Human-Generated Summaries with Weakly Supervised Learning.
Joakim Olsen, Arild Brandrud Næss and Pierre Lison.
 
13:15-14:55 Session 4: IE & Text Classification
Chair: Leon Derczynski
Trello
13:15-13:40 Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency.
Lovisa Hagström and Richard Johansson.
 
13:40-14:05 Fine-grained Named Entity Annotation for Finnish.
Jouni Luoma, Li-Hsin Chang, Filip Ginter and Sampo Pyysalo.
 
14:05-14:30 Survey and reproduction of computational approaches to dating of historical texts.
Sidsel Boldsen and Fredrik Wahlberg.
 
14:30-14:55 Multilingual and Zero-Shot is Closing in on Monolingual Web Register Classification.
Samuel Rönnqvist, Valtteri Skantsi, Miika Oinonen and Veronika Laippala.
 
14:55-15:15 Coffee break  
15:15-16:30 Session 5: Posters Trello
  What Taggers Fail to Learn, Parsers Need the Most.
Mark Anderson and Carlos Gómez-Rodríguez.
 
  Investigation of Transfer Languages for Parsing Latin: Italic Branch vs. Hellenic Branch.
Antonia Karamolegkou and Sara Stymne.
 
  Towards cross-lingual application of language-specific PoS tagging schemes.
Hinrik Hafsteinsson and Anton Karl Ingason.
 
  Exploring the Importance of Source Text in Automatic Post-Editing for Context-Aware Machine Translation.
Chaojun Wang, Christian Hardmeier and Rico Sennrich.
 
  Chinese Character Decomposition for Neural MT with Multi-Word Expressions.
Lifeng Han, Gareth Jones, Alan Smeaton and Paolo Bolzoni.
 
  Grapheme-Based Cross-Language Forced Alignment: Results with Uralic Languages.
Juho Leinonen, Sami Virpioja and Mikko Kurimo.
 
  Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation.
Mikko Aulamo, Sami Virpioja, Yves Scherrer and Jörg Tiedemann.
 
  Building a Swedish Open-Domain Conversational Language Model.
Tobias Norlund and Agnes Stenbom.
 
  It’s Basically the Same Language Anyway: the Case for a Nordic Language Model.
Magnus Sahlgren, Fredrik Carlsson, Fredrik Olsson and Love Börjeson.
 
  Decentralized Word2Vec Using Gossip Learning.
Abdul Aziz Alkathiri, Lodovico Giaretta, Sarunas Girdzijauskas and Magnus Sahlgren.
 
  Multilingual ELMo and the Effects of Corpus Sampling.
Vinit Ravishankar, Andrey Kutuzov, Lilja Øvrelid and Erik Velldal.
 
  Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?
Tim Isbister, Fredrik Carlsson and Magnus Sahlgren.
 
16:30-18:00 Social event  

Wednesday June 2, 2021

Time Session Access
09:00-10:40 Parallel sessions  
09:00-10:40 Session 6: Morphology & Syntax
Chair: Miryam de Lhoneux
Trello
09:00-09:25 Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered.
Mika Hämäläinen, Niko Partanen, Jack Rueter and Khalid Alnajjar.
 
09:25-09:50 CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish.
Elena Volodina, Yousuf Ali Mohammed and Therese Lindström Tiedemann.
 
09:50-10:15 Chunking Historical German.
Katrin Ortmann.
 
10:15-10:40 Part-of-speech tagging of Swedish texts in the neural era.
Yvonne Adesam and Aleksandrs Berdicevskis.
 
09:00-10:40 Session 7: NLP applications
Chair: Filip Ginter
Trello
09:00-09:25 De-identification of Privacy-related Entities in Job Postings.
Kristian Nørgaard Jensen, Mike Zhang and Barbara Plank.
 
09:25-09:50 Creating and Evaluating a Synthetic Norwegian Clinical Corpus for De-Identification.
Synnøve Bråthen, Wilhelm Wie and Hercules Dalianis.
 
09:50-10:15 Applying and Sharing pre-trained BERT-models for Named Entity Recognition and Classification in Swedish Electronic Patient Records.
Mila Grancharova and Hercules Dalianis.
 
10:15-10:40 An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish.
Quan Duong, Mika Hämäläinen and Simon Hengchen.
 
10:40-11:00 Coffee break  
11:00-12:15 Parallel sessions  
11:00-12:15 Session 8: Lexical semantics & embeddings
Chair: Magnus Sahlgren
Trello
11:00-11:25 Learning to Lemmatize in the Word Representation Space.
Jarkko Lagus and Arto Klami.
 
11:25-11:50 Synonym Replacement based on a Study of Basic-level Nouns in Swedish Texts of Different Complexity.
Evelina Rennes and Arne Jönsson.
 
11:50-12:15 SuperSim: a test set for word similarity and relatedness in Swedish.
Simon Hengchen and Nina Tahmasebi.
 
11:00-12:15 Session 9: Sentence-level Semantics
Chair: Johanna Björklund
Trello
11:00-11:25 NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance.
Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis and Jörg Tiedemann.
 
11:25-11:50 Finnish Paraphrase Corpus.
Jenna Kanerva, Filip Ginter, Li-Hsin Chang, Iiro Rastas, Valtteri Skantsi, Jemina Kilpeläinen, Hanna-Mari Kupari, Jenna Saarni, Maija Sevón and Otto Tarkka.
 
11:50-12:15 Negation in Norwegian: an annotated dataset.
Petter Mæhlum, Jeremy Barnes, Robin Kurtz, Lilja Øvrelid and Erik Velldal.
 
12:15-13:00 Lunch break  
13:00-14:00 NEALT business meeting Trello
14:00-14:50 Keynote: Adina Williams
Chair: Sara Stymne
Trello
14:50-15:05 Coffee break  
15:05-16:35 Session 10: Posters/demos Trello
  DaNLP: An open-source toolkit for Danish Natural Language Processing.
Amalie Brogaard Pauli, Maria Barrett, Ophélie Lacroix and Rasmus Hvingelby.
 
  HB Deid - HB De-identification tool demonstrator.
Hanna Berg and Hercules Dalianis.
 
  Error Analysis of using BART for Multi-Document Summarization: A Study for English and German Language.
Timo Johner, Abhik Jana and Chris Biemann.
 
  Grammatical Error Generation Based on Translated Fragments.
Eetu Sjöblom, Mathias Creutz and Teemu Vahtola.
 
  Creating Data in Icelandic for Text Normalization.
Helga Svala Sigurðardóttir, Anna Björk Nikulásdóttir and Jón Guðnason.
 
  The Danish Gigaword Corpus.
Leon Strømberg-Derczynski, Manuel Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Jens Madsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm and Daniel Varab.
 
  DanFEVER: claim verification dataset for Danish.
Jeppe Nørregaard and Leon Derczynski.
 
  The Icelandic Word Web: A language technology-focused redesign of a lexicosemantic database.
Hjalti Daníelsson, Jón Hilmar Jónsson, Þórður Arnar Árnason, Alec Shaw, Einar Freyr Sigurðsson and Steinþór Steingrímsson.
 
  Getting Hold of Villains and other Rogues.
Manfred Klenner, Anne Göhring and Sophia Conrad.
 
  Talrómur: A large Icelandic TTS corpus.
Atli Sigurgeirsson, Þorsteinn Gunnarsson, Gunnar Örnólfsson, Eydís Magnúsdóttir, Ragnheiður Þórhallsdóttir, Stefán Jónsson and Jón Guðnason.
 
  NorDial: A Preliminary Corpus of Written Norwegian Dialect Use.
Jeremy Barnes, Petter Mæhlum and Samia Touileb.
 
  The Swedish Winogender Dataset.
Saga Hansson, Konstantinos Mavromatakis, Yvonne Adesam, Gerlof Bouma and Dana Dannélls.
 
16:35-16:45 Closing and announcement of next NoDaLiDa