EACL 2026 Abjad NLP Shared Task: Medical text classification in Arabic

A collaborative research challenge advancing Arabic natural language processing in the medical domain

Overview

Welcome to the medical text classification shared task at the Abjad NLP workshop in EACL 2026! This shared task brings together researchers and practitioners to develop and evaluate state-of-the-art models for processing Arabic medical texts.

Each row in the dataset contains a question-answer pair in Arabic under the column named "text", along with a category under the "category" column (the class to be predicted). There are 82 categories in total, and there is considerable class imbalance in the dataset across these 82 categories, which makes the problem interesting.

Note that you need to predict the integer label corresponding to the "category" column given under the "label" column. These category names were originally in Arabic, but we translated them into English using an LLM in order to aid in modeling.

These question-answer pairs are from the healthcare domain, and given the importance of NLP applications for healthcare in Abjad script languages such as Arabic, we are confident that this shared task will attract significant interest and have a positive impact on the community.

هيا نستمتع!

Registration

Sign up for the competition on Kaggle after filling out the Google form!

Getting Started

To help you get started, we have shared a sample Colab notebook, where we finetune CamelBERT.

Important Dates

All deadlines are 11:59 PM UTC-12 (Anywhere on Earth)

Release of training data
December 8, 2025
Release of test data
December 25, 2025
End of evaluation cycle (test submissions close)
December 31, 2025
Final results released
January 2, 2026
Shared task papers due date
January 13, 2026
Notification of acceptance
January 20, 2026
Camera-ready versions due
February 3, 2026
Workshop Dates
March 24–29, 2026 [TBD]

Task Description

Task Format

Participants will develop systems to perform multi-class classification of Arabic medical text into 82 predefined categories. Each text instance must be assigned to exactly one category represented by an integer label between 0 and 81.

Dataset Information

The dataset consists of authentic medical-domain text in Arabic. Each row in the dataset contains:

  • text: A medical-domain text segment written in Arabic
  • category: The English name of the corresponding medical category
  • label: The integer class label (0–81) that participants must predict

There are 82 categories in total, and the dataset exhibits notable class imbalance, making the task both challenging and practically important for real-world healthcare NLP applications.

Dataset Links

Evaluation Metric

Submissions will be evaluated using the macro-averaged F1 score across all 82 classes. This metric assigns equal weight to each category, encouraging solutions that perform well even on minority classes.

For more details about the macro F1 score, refer to the scikit-learn documentation .

Contact

For questions or clarifications, please contact the organising team.

We look forward to your participation in the AbjadNLP Medical Text Classification shared task and to advancing medical NLP for Arabic and other Abjad-script languages.

Submission Guidelines

Prediction File Format

You will submit your predictions as a CSV file with 2 columns: Id and Predicted.

Id,Predicted
0,34
1,76
2,43

Each row should contain:

  • Id: Row identifier from the evaluation dataset
  • Predicted: Your predicted integer label (0-81) corresponding to one of the 82 categories

Submission Instructions

  1. Download Data: Download the training and evaluation datasets from the provided Google Drive links
  2. Develop System: Build and train your classification model using the training dataset
  3. Generate Predictions: Run your system on the evaluation dataset to generate predictions
  4. Format Output: Create a CSV file with Id and Predicted columns as shown above
  5. Submit: Submit your CSV file through the submission portal before the deadline (December 31, 2025)
  6. Await Results: Final results will be released on January 2, 2026

System Description Papers

Participants are highly encouraged to submit system description papers detailing their approaches. Submitting a paper at a shared task is a low-stress way to join the AI research community. Accepted system description papers will be a part of the ACL Anthology. As long as you adhere to the system paper submission guidelines, you will most likely get accepted. It DOES NOT matter whether you topped the leaderboard or not- we are excited to see the progress you have made, and all of us will have something to learn from your explorations. Writing a system description paper might sound challenging at first, but we are here to help! Here is a tutorial to get you started, and our contact information can be found at the bottom of this page. Papers should cover:

  • Model architecture and approach
  • Preprocessing and feature engineering techniques
  • Training procedure and hyperparameters
  • External resources used (if any)
  • Results and analysis

Paper submission deadline: January 13, 2026
Notification of acceptance: January 20, 2026
Camera-ready versions due: February 3, 2026

Contact for Submissions

For questions about submissions or technical issues, please contact the organizers.

Organizers

The EACL 2026 Abjad NLP Medical Text Classification Shared Task is organized by a team of researchers and practitioners specialized in natural language processing.

Pranav Gupta

Pranav Gupta

PhD, Senior Machine Learning Engineer at Cisco

Lead Organizer

LinkedIn Profile
Niranjan Kumar M

Niranjan Kumar M

Specialization in Data Science, Sr Data Scientist at Lowe's

Co-Organizer

LinkedIn Profile
Balaji Nagarajan

Balaji Nagarajan

Master in Data Science, Senior Manager of Data Science at Lowe's

Co-Organizer

LinkedIn Profile
TBD

Imed Zitouni

PhD, Senior Director of Engineering at Meta
Editor-in-Chief at ACM TALLIP

Co-Organizer

LinkedIn Profile

For questions or clarifications, please contact the organising team at abjadnlpmedicaltextclassificat@gmail.com.