Attribution Visualization Tool

Published in Open Source Project, 2024

Project Overview

The Attribution Visualization Tool is an interactive web application that helps researchers and practitioners understand how different attribution methods highlight important tokens in language model predictions.

Key Features

Multiple Attribution Methods: Gradient-based, integrated gradients, attention-based, and LIME
Interactive Interface: Real-time visualization as you type
Model Support: Compatible with BERT, GPT-2, T5, and custom models
Export Options: Save visualizations as PNG or interactive HTML
Batch Processing: Analyze multiple examples at once

Demo

Try the live demo: Attribution Visualization Tool

Attribution Demo

Example showing gradient-based attribution for sentiment classification

Technical Implementation

Architecture

Frontend (React + D3.js)
    ↓
API Gateway (FastAPI)
    ↓
Attribution Engine (PyTorch + Transformers)
    ↓
Model Hub (Cached pretrained models)

Supported Attribution Methods

Gradient-based Attribution

def gradient_attribution(model, input_ids, target_class):
    embeddings = model.get_input_embeddings()(input_ids)
    embeddings.requires_grad_(True)
       
    logits = model(inputs_embeds=embeddings).logits
    target_logit = logits[0, target_class]
       
    target_logit.backward()
    return embeddings.grad.norm(dim=-1)

Integrated Gradients

def integrated_gradients(model, input_ids, baseline_ids, target_class, steps=50):
    attributions = []
    for alpha in np.linspace(0, 1, steps):
        interpolated = baseline_ids + alpha * (input_ids - baseline_ids)
        grad = compute_gradient(model, interpolated, target_class)
        attributions.append(grad)
       
    return np.mean(attributions, axis=0) * (input_ids - baseline_ids)

Attention-based Attribution

def attention_attribution(model, input_ids, layer=-1, head=None):
    outputs = model(input_ids, output_attentions=True)
    attentions = outputs.attentions[layer]
       
    if head is None:
        # Average across all heads
        return attentions.mean(dim=1)
    else:
        return attentions[:, head]

Installation and Usage

Quick Start

# Clone the repository
git clone https://github.com/Yupei-Du/attribution-viz.git
cd attribution-viz

# Install dependencies
pip install -r requirements.txt
npm install

# Run the development server
python app.py &
npm start

Using the API

import requests

# Submit text for attribution analysis
response = requests.post('http://localhost:8000/api/attribute', json={
    'text': 'This movie is absolutely fantastic!',
    'model': 'bert-base-uncased',
    'method': 'gradient',
    'task': 'sentiment'
})

attributions = response.json()['attributions']

Research Applications

This tool has been used in several research projects:

1. Bias Detection Study

Analyzed attribution patterns for gender-biased predictions
Identified problematic tokens that trigger biased responses
Published findings in Understanding Gender Bias in KB Embeddings

2. Model Comparison Analysis

Compared attribution patterns across different model architectures
Found that BERT and RoBERTa focus on different linguistic features
Results presented at ACL 2023

3. Educational Tool

Used in graduate-level NLP courses
Helps students understand model behavior intuitively
Adopted by 5+ universities

Performance Metrics

Speed Benchmarks

Small models (BERT-base): ~50ms per example
Large models (GPT-3.5): ~200ms per example
Batch processing: 10x speedup for 100+ examples

Accuracy Validation

Compared attribution rankings with human annotations:

Gradient method: 0.72 Kendall’s τ
Integrated Gradients: 0.78 Kendall’s τ
Attention: 0.65 Kendall’s τ

Future Enhancements

Planned Features

Support for multimodal models (vision-language)
Real-time collaborative analysis
Integration with popular ML frameworks (MLflow, Weights & Biases)
Advanced visualization options (heatmaps, network graphs)

Long-term Goals

Attribution method comparison framework
Automated attribution quality assessment
Integration with model debugging workflows
Support for non-English languages

Contributing

We welcome contributions! Areas where help is needed:

New Attribution Methods: Implement additional attribution algorithms
Model Support: Add support for more model architectures
Visualization: Create new ways to display attribution results
Performance: Optimize for larger models and datasets

See CONTRIBUTING.md for detailed guidelines.

Citation

If you use this tool in your research, please cite:

@software{du2024attribution,
  author = {Du, Yupei},
  title = {Attribution Visualization Tool},
  url = {https://github.com/Yupei-Du/attribution-viz},
  year = {2024}
}