Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in Advances in Neural Information Processing Systems (NeurIPS), 2023
Machine Learning as a Service APIs expose high-quality encoders that are expensive to train, making them lucrative targets for model stealing attacks. We propose Bucks for Buckets (B4B), the first active defense that prevents stealing while the attack is happening without degrading representation quality for legitimate users. B4B adaptively adjusts the utility of returned representations based on a user’s coverage of the embedding space and individually transforms each user’s representations to prevent sybil-based aggregation.
Published in Winter Conference on Computer Vision (WACV), 2024
We propose a methodology to establish a fair evaluation setup for membership inference attacks on large diffusion models such as Stable Diffusion. Our research reveals that previously proposed evaluation setups significantly overestimate the effectiveness of these attacks. We conclude that membership inference remains a significant challenge for large diffusion models deployed as black-box systems, indicating that related privacy and copyright issues will persist.
Published in European Conference on Artificial Intelligence (ECAI), 2024
We identify a new method for performing unsupervised model-stealing attacks against inductive graph neural networks, utilizing graph contrastive learning and spectral graph augmentations. Our approach outperforms the state-of-the-art across all six evaluated datasets, achieving superior fidelity and downstream accuracy of the stolen model. Crucially, it requires fewer queries directed toward the target model, making the attack practical even under restricted API access.
Published in ICLR Workshop on Building Trust in Language Models and Applications, 2025
We show that instruction-finetuned LLMs already encode safety-relevant information internally, with safe and unsafe prompts being distinctly separable in the model’s latent space. Building on this, we introduce the Latent Prototype Moderator (LPM), a training-free moderation method that uses Mahalanobis distance in latent space to assess input safety. LPM matches or exceeds state-of-the-art guard models across multiple benchmarks while being a lightweight, customizable add-on that generalizes across model families and sizes.
Published in International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2025
We introduce LGR-AD, a multi-agent system that models the text-to-image generation process as a distributed system of interacting agents, each representing an expert diffusion sub-model. These agents dynamically adapt to varying conditions and collaborate through a graph neural network that encodes their relationships and performance metrics. A coordination mechanism based on top-k maximum spanning trees optimizes the generation process, outperforming traditional diffusion models across various benchmarks.
Published in Conference on Computer Vision and Pattern Recognition (CVPR), 2025
We demonstrate that existing membership inference attacks are not strong enough to reliably detect individual images in large, state-of-the-art diffusion models. To overcome this, we propose CDI, a dataset inference framework that aggregates signals from multiple data points belonging to a single owner. CDI allows data owners with as few as 70 samples to identify with over 99% confidence whether their data was used to train a given diffusion model.
Published in International Conference on Machine Learning (ICML), 2025
We conduct the first comprehensive privacy analysis of image autoregressive models (IARs), showing they exhibit significantly higher privacy risks than diffusion models. We develop a novel membership inference attack achieving a true positive rate of 86% at 1% false positive rate, compared to just 6% for diffusion models. We further demonstrate successful dataset inference with as few as 6 samples and extract hundreds of training data points from deployed models.
Published in arXiv preprint, 2025
Model merging is highly susceptible to backdoor attacks that allow adversaries to control the merged model’s output at inference time. We propose treating the attack itself as a task vector: the Backdoor Vector is the weight difference between a backdoored and clean model, revealing new insights into attack similarity and transferability. We introduce Sparse Backdoor Vectors for stronger attacks and Injection BV Subtraction, an assumption-free defense that remains effective even when the threat is unknown.
Published in European Conference on Artificial Intelligence (ECAI), 2025
Traditional Monte Carlo simulations of particle detector responses at CERN are computationally expensive and strain the computational grid. We present ExpertSim, a Mixture-of-Generative-Experts architecture tailored for the Zero Degree Calorimeter in the ALICE experiment, where each expert specializes in a different subset of the data. ExpertSim improves accuracy over standard methods while providing a significant speedup, offering a practical solution for high-efficiency detector simulations.
Published in AAAI Conference on Artificial Intelligence (AAAI), 2026
Current GNN model-stealing methods rely heavily on queries to the victim model, assuming no hard query limits, but in practice the number of allowed queries can be severely limited. We demonstrate how an adversary can extract a GNN with very limited interactions by first obtaining the model backbone without direct queries, then strategically utilizing a fixed query budget to extract the most informative data. Experiments on eight real-world datasets show the attack is effective even under severe query restrictions and active defenses.
Published in arXiv preprint, 2026
Current Text-to-Image models remain prone to generating unsafe content, and linear activation steering frequently degrades image quality on benign prompts. We propose Conditioned Activation Transport (CAT), a framework that employs geometry-based conditioning and nonlinear transport maps that activate only within unsafe activation regions. Validated on Z-Image and Infinity architectures, CAT significantly reduces Attack Success Rate while maintaining image fidelity compared to unsteered generations.
Published in International Conference on Learning Representations (ICLR), 2026
Methods relying on exact zero activations do not apply to modern LLMs that use SiLU or GELU, leading to fragmented strategies and a gap in general understanding. We introduce a general framework for evaluating sparsity robustness and conduct a systematic investigation across diverse model families and scales. Our results uncover universal properties of activation sparsity, notably that the potential for effective sparsity grows with model size, and present the first study of activation sparsity in diffusion-based LLMs.
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.