Publications
Selected Publications
All Publications
- MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs
Findings of the Association for Computational Linguistics: EMNLP 2025
- Make Every Letter Count: Building Dialect Variation Dictionaries from Monolingual Corpora
Findings of the Association for Computational Linguistics: EMNLP 2025
- RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Crossing Domains without Labels: Distant Supervision for Term Extraction
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
- What Media Frames Reveal About Stance: A Dataset and Study about Memes in Climate Change Discourse
Findings of the Association for Computational Linguistics: EMNLP 2025
- BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
- Relevant for the Right Reasons? Investigating Lexical Biases in Zero-Shot and Instruction-Tuned Rerankers
Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
- Revisiting Active Learning under (Human) Label Variation
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
- Aligning NLP Models with Target Population Perspectives using PAIR: Population-Aligned Instance Replication
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
- LeWiDi-2025 at NLPerspectives: The Third Edition of the Learning with Disagreements Shared Task
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
- BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
- Tracing Multilingual Factual Knowledge Acquisition in Pretraining
Findings of the Association for Computational Linguistics: EMNLP 2025
- Reason to Rote: Rethinking Memorization in Reasoning
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- DistaLs: a Comprehensive Collection of Language Distance Measures
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Human-centered LLMs for Inclusive Language Technology: The Need to Embrace Variation Holistically in NLP
Proceedings of the 20th Conference on Computer Science and Intelligence Systems (FedCSIS)
- Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
Findings of the Association for Computational Linguistics: ACL 2025
- Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- What’s the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
- Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter
Findings of the Association for Computational Linguistics: ACL 2025
- Methods and Resources in Germanic Variationist Linguistics
Oxford Research Encyclopedia of Linguistics
- Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
Findings of the Association for Computational Linguistics: NAACL 2025
- Evaluating Pixel Language Models on Non-Standardized Languages
Proceedings of the 31st International Conference on Computational Linguistics
- Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages
Proceedings of the 31st International Conference on Computational Linguistics
- KARRIEREWEGE: A large scale Career Path Prediction Dataset
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
- Neural Text Normalization for Luxembourgish Using Real-Life Variation Data
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
- Neural Text Normalization for Luxembourgish Using Real-Life Variation Data
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
- Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
- Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection
Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
- Fine-grained Sexism Detection in Italian Newspapers
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
- The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Findings of the Association for Computational Linguistics: EMNLP 2024
- “Seeing the Big through the Small”: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
Findings of the Association for Computational Linguistics: EMNLP 2024
- To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity
Findings of the Association for Computational Linguistics: EMNLP 2024
- Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
First Conference on Language Modeling
- Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
First Conference on Language Modeling
- Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- “My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Findings of the Association for Computational Linguistics ACL 2024
- Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- VariErr NLI: Separating Annotation Error from Human Label Variation
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- CLIMATELI: Evaluating Entity Linking on Climate Change Data
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)
- Position: Insights from Survey Methodology can Improve Training Data
Forty-first International Conference on Machine Learning
- Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
- MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- EEVEE: An Easy Annotation Tool for Natural Language Processing
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
- Donkii: Characterizing and Detecting Errors in Instruction-Tuning Datasets
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
- Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
- NNOSE: Nearest Neighbor Occupational Skill Extraction
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
- Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
- Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings
Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)
- Entity Linking in the Job Market Domain
Findings of the Association for Computational Linguistics: EACL 2024
- Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations
Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language
- More Labels or Cases? Assessing Label Variation in Natural Language Inference
Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language
- Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training
Findings of the Association for Computational Linguistics: EMNLP 2023
- What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Establishing Trustworthiness: Rethinking Tasks and Model Evaluation
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis
Findings of the Association for Computational Linguistics: EMNLP 2023
- Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- ActiveAED: A Human in the Loop Improves Annotation Error Detection
Findings of the Association for Computational Linguistics: ACL 2023
- Silver Syntax Pre-training for Cross-Domain Relation Extraction
Findings of the Association for Computational Linguistics: ACL 2023
- Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data
Findings of the Association for Computational Linguistics: ACL 2023
- SemEval-2023 Task 11: Learning with Disagreements (LeWiDi)
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
- How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
- ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- A Survey of Corpora for Germanic Low-Resource Languages and Dialects
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Low-resource Bilingual Dialect Lexicon Induction with Large Language Models
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Findings of the VarDial Evaluation Campaign 2023
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
- Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
- CrossRE: A Cross-Domain Dataset for Relation Extraction
Findings of the Association for Computational Linguistics: EMNLP 2022
- Experimental Standards for Deep Learning in Natural Language Processing Research
Findings of the Association for Computational Linguistics: EMNLP 2022
- On Language Spaces, Scales and Cross-Lingual Transfer of UD Parsers
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)
- The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Spectral Probing
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Evidence > Intuition: Transferability Estimation for Encoder Selection
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Stop Measuring Calibration When Humans Disagree
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- An Interdisciplinary Perspective on Evaluation and Experimental Design for Visual Text Analytics: Position Paper
2022 IEEE Evaluation and Beyond - Methodological Approaches for Visualization (BELIV)
- Skill Extraction from Job Postings using Weak Supervision
Proceedings of the 2nd Workshop on Recommender Systems for Human Resources (RecSys-in-HR 2022)
- SkillSpan: Hard and Soft Skill Extraction from English Job Postings
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Sort by Structure: Language Model Ranking as Dependency Probing
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Sliced at SemEval-2022 Task 11: Bigger, Better? Massively Multilingual LMs for Multilingual Complex NER on an Academic GPU Budget
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
- Fine-tuning vs From Scratch: Do Vision & Language Models Have Similar Capabilities on Out-of-Distribution Visual Question Answering?
Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embeddings
Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning
Proceedings of the Thirteenth Language Resources and Evaluation Conference
- What Do You Mean by Relation Extraction? A Survey on Datasets and Study on Scientific Relation Classification
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
- Probing for Labeled Dependency Trees
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)