Blog posts

2026

SafeSteerDataset: A Contrastive Dataset for T2I Safety Steering

3 minute read

Published: March 15, 2026

We release SafeSteerDataset on Hugging Face, a contrastive dataset of 2,300 safe/unsafe prompt pairs designed for activation steering in Text-to-Image models. Existing T2I safety benchmarks (I2P, CoPro, T2ISafety) focus on broad evaluation or unsafe prompt detection, but they do not curate pairs of safe and unsafe prompts that are highly semantically similar. This semantic alignment is critical because without it, steering methods capture spurious artifacts rather than isolating the actual direction of toxicity in the activation space.

2025

Image AutoRegressive Models Leak More Training Data Than Diffusion Models

less than 1 minute read

Published: July 26, 2025

Image AutoRegressive models (IARs) have recently emerged as a powerful alternative to diffusion models (DMs), surpassing them in image generation quality, speed, and scalability. Yet, despite their advantages, the privacy risks of IARs remain completely unexplored. When trained on sensitive or copyrighted data, these models may unintentionally expose training samples, creating major security and ethical concerns.

Jan Dubiński

Blog posts

2026

SafeSteerDataset: A Contrastive Dataset for T2I Safety Steering

2025

Image AutoRegressive Models Leak More Training Data Than Diffusion Models