Skip to content

Kim-Sha/ai-image-detection

Repository files navigation

Fine-Tuning a Vision Transformer (Swin-Tiny) for Detection and Classification of AI-generated Images

The notebooks in this repository focus primarily on fine-tuning a pre-trained vision transformer (Swin-Tiny) to extend a binary classification problem: identifying whether an image is created by generative AI. The work here expands the scope of this baseline into a multiclass classification problem: identifying whether an image is authentic (human-generated) or generated by one of a series of text-to-image AI generators (i.e., Stable Diffusion, Midjourney, and DALL-E).

A robot detective verifying the authenticity of artwork generated by artificial intelligence

The goal was to tackle the multiclass classification problem using three separate approaches to transfer learning:

  1. The first experiment used the model as a feature extractor. Extracted outputs were passed to a logistic regressor implemented in Scikit-learn (LogisticRegressionCV) to classify the images.
  2. The second experiment was fine-tuning with frozen layers. It involved freezing all of the parameters up until the final linear layer, and then adding our own linear layer that transformed the output dimensions and handed off to a softmax for the classification.
  3. The third experiment was selective fine-tuning: a natural extension to experiment 2 where we froze every layer except the last one (specifically Stage 3, Block 1), which would remain unfrozen and trainable. As with the previous experiment, we added a trainable linear layer with a softmax for classification.

Read the full report here.

About

DFFNN to classify images created by generative AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published