CLARIN Tool Portal

Debiasing Algorithm through Model Adaptation

2 resources

Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets.

Use "Debiasing Algorithm through Model Adaptation"

Result filters

Metadata provider

Language

Resource type

Tool task

Project

Keywords

Active filters:

Search results

Debiasing Algorithm through Model Adaptation