Back to Explorer
Research PaperResearchia:202602.24023[Mathematics > Mathematics]

A Computationally Efficient Multidimensional Vision Transformer

Alaa El Ichi

Abstract

Vision Transformers have achieved state-of-the-art performance in a wide range of computer vision tasks, but their practical deployment is limited by high computational and memory costs. In this paper, we introduce a novel tensor-based framework for Vision Transformers built upon the Tensor Cosine Product (Cproduct). By exploiting multilinear structures inherent in image data and the orthogonality of cosine transforms, the proposed approach enables efficient attention mechanisms and structured feature representations. We develop the theoretical foundations of the tensor cosine product, analyze its algebraic properties, and integrate it into a new Cproduct-based Vision Transformer architecture (TCP-ViT). Numerical experiments on standard classification and segmentation benchmarks demonstrate that the proposed method achieves a uniform 1/C parameter reduction (where C is the number of channels) while maintaining competitive accuracy.


Source: arXiv:2602.19982v1 - http://arxiv.org/abs/2602.19982v1 PDF: https://arxiv.org/pdf/2602.19982v1 Original Link: http://arxiv.org/abs/2602.19982v1

Submission:2/24/2026
Comments:0 comments
Subjects:Mathematics; Mathematics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

A Computationally Efficient Multidimensional Vision Transformer | Researchia | Researchia