EACL 2026 Abjad NLP Shared Task: Medical Text Classification in Arabic

Overview

Welcome to the medical text classification shared task at the Abjad NLP workshop in EACL 2026! This shared task brings together researchers and practitioners to develop and evaluate state-of-the-art models for processing Arabic medical texts.

Each row in the dataset contains a question-answer pair in Arabic under the column named "text", along with a category under the "category" column (the class to be predicted). There are 82 categories in total, and there is considerable class imbalance in the dataset across these 82 categories, which makes the problem interesting.

Note that you need to predict the integer label corresponding to the "category" column given under the "label" column. These category names were originally in Arabic, but we translated them into English using an LLM in order to aid in modeling.

These question-answer pairs are from the healthcare domain, and given the importance of NLP applications for healthcare in Abjad script languages such as Arabic, we are confident that this shared task will attract significant interest and have a positive impact on the community.

هيا نستمتع!

Registration

Sign up for the competition on Kaggle after filling out the Google form!

Getting Started

To help you get started, we have shared a sample Colab notebook, where we finetune CamelBERT.

Important Dates

All deadlines are 11:59 PM UTC-12 (Anywhere on Earth)

Release of training data

December 8, 2025

Release of test data

December 25, 2025

End of evaluation cycle (test submissions close)

December 31, 2025

Final results released

January 2, 2026

Shared task papers due date

January 13, 2026

Notification of acceptance

January 20, 2026

Camera-ready versions due

February 3, 2026

Workshop Dates

March 24–29, 2026 [TBD]

Task Description

Task Format

Participants will develop systems to perform multi-class classification of Arabic medical text into 82 predefined categories. Each text instance must be assigned to exactly one category represented by an integer label between 0 and 81.

Dataset Information

The dataset consists of authentic medical-domain text in Arabic. Each row in the dataset contains:

text: A medical-domain text segment written in Arabic
category: The English name of the corresponding medical category
label: The integer class label (0–81) that participants must predict

There are 82 categories in total, and the dataset exhibits notable class imbalance, making the task both challenging and practically important for real-world healthcare NLP applications.

Dataset Links

Training Dataset: Download here
Evaluation Dataset (no labels): Download here

Evaluation Metric

Submissions will be evaluated using the macro-averaged F1 score across all 82 classes. This metric assigns equal weight to each category, encouraging solutions that perform well even on minority classes.

For more details about the macro F1 score, refer to the scikit-learn documentation .

Contact

For questions or clarifications, please contact the organising team.

We look forward to your participation in the AbjadNLP Medical Text Classification shared task and to advancing medical NLP for Arabic and other Abjad-script languages.

Submission Guidelines

📢 Sign Up on Kaggle

To participate in this shared task, please register through the Google form first (see link at the top of the page) and then go to Kaggle to join the competition:

Join the Kaggle Competition

Prediction File Format

You will submit your predictions as a CSV file with 2 columns: Id and Predicted.

Id,Predicted
0,34
1,76
2,43

Each row should contain:

Id: Row identifier from the evaluation dataset
Predicted: Your predicted integer label (0-81) corresponding to one of the 82 categories

Submission Instructions

Download Data: Download the training and evaluation datasets from the provided Google Drive links
Develop System: Build and train your classification model using the training dataset
Generate Predictions: Run your system on the evaluation dataset to generate predictions
Format Output: Create a CSV file with Id and Predicted columns as shown above
Submit: Submit your CSV file through the submission portal before the deadline (December 31, 2025)
Await Results: Final results will be released on January 2, 2026

System Description Papers

Participants are highly encouraged to submit system description papers detailing their approaches. Submitting a paper at a shared task is a low-stress way to join the AI research community. Accepted system description papers will be a part of the ACL Anthology. As long as you adhere to the system paper submission guidelines, you will most likely get accepted. It DOES NOT matter whether you topped the leaderboard or not- we are excited to see the progress you have made, and all of us will have something to learn from your explorations. Writing a system description paper might sound challenging at first, but we are here to help! Here is a tutorial to get you started, and our contact information can be found at the bottom of this page. Papers should cover:

Model architecture and approach
Preprocessing and feature engineering techniques
Training procedure and hyperparameters
External resources used (if any)
Results and analysis

Paper submission deadline: January 13, 2026
Notification of acceptance: January 20, 2026
Camera-ready versions due: February 3, 2026

Contact for Submissions

For questions about submissions or technical issues, please contact the organizers.

Organizers

The EACL 2026 Abjad NLP Medical Text Classification Shared Task is organized by a team of researchers and practitioners specialized in natural language processing.

Pranav Gupta

PhD, Senior Machine Learning Engineer at Cisco

Lead Organizer

LinkedIn Profile

Niranjan Kumar M

Specialization in Data Science, Sr Data Scientist at Lowe's

Co-Organizer

LinkedIn Profile

Balaji Nagarajan

Master in Data Science, Senior Manager of Data Science at Lowe's

Co-Organizer

LinkedIn Profile

Imed Zitouni

PhD, Senior Director of Engineering at Meta
Editor-in-Chief at ACM TALLIP

Co-Organizer

LinkedIn Profile

For questions or clarifications, please contact the organising team at abjadnlpmedicaltextclassificat@gmail.com.