Back to Explorer
Research PaperResearchia:202602.10019[Biomedical Engineering > Engineering]

Fine-Grained Cat Breed Recognition with Global Context Vision Transformer

Mowmita Parvin Hera

Abstract

Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet Dataset, which contains high-resolution images of various domestic breeds. We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition. To improve model generalization, we used extensive data augmentation, including rotation, horizontal flipping, and brightness adjustment. Experimental results show that the GCViT-Tiny model achieved a test accuracy of 92.00% and validation accuracy of 94.54%. These findings highlight the effectiveness of transformer-based architectures for fine-grained image classification tasks. Potential applications include veterinary diagnostics, animal shelter management, and mobile-based breed recognition systems. We also provide a hugging face demo at https://huggingface.co/spaces/bfarhad/cat-breed-classifier.


Source: arXiv:2602.07534v1 - http://arxiv.org/abs/2602.07534v1 PDF: https://arxiv.org/pdf/2602.07534v1 Original Link: http://arxiv.org/abs/2602.07534v1

Submission:2/10/2026
Comments:0 comments
Subjects:Engineering; Biomedical Engineering
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Fine-Grained Cat Breed Recognition with Global Context Vision Transformer | Researchia | Researchia