ITRT(IT Research Trends)

Object Detection using Vision Transformer and Deep Learning for Computer Vision Applications

연구 분야: Artificial Intelligence

논문 키워드: #neural #improved #vision #transformer #scalability

학회: 2025 7th International Conference on Intelligent Sustainable Systems (ICISS)

초록

Vision Transformer (ViT) is an image recognition model that uses transformer architecture, which has a numerous advantage over Convolution Neural Networks (CNN). It offers improved accuracy, scalability, flexibility, global context, and transferability. ViT can handle images of different sizes and aspect ratios, making it more versatile than CNN. It can process an entire image at once, allowing it to capture global context information and long-range dependencies. Additionally, ViTs pre-training on huge amounts of image data can be transferred to other image recognition tasks, making it a useful tool for transfer learning. This paper describes the differences between ViT and CNN and how ViT splits images into patches for classification. The positional encoding of different features is done in ViT to avoid the requirement of filters. Proposed implementation obtained final accuracy of prediction 93% for top-1 accuracy.

📄 논문 정보

발행 연도	2025년
인용수	239
출판 국가	Andorra
사이트	IEEE
좋아요 수	0

Object Detection using Vision Transformer and Deep Learning for Computer Vision Applications

Object Detection using Vision Transformer and Deep Learning for Computer Vision Applications

Sudarshan S

Mohana

Ambika G

Kavitha A

Nataraj K

Sudhangowda B S

📄 논문 정보

연관 논문 목록 (180건)

Object Detection using Vision Transformer and Deep Learning for Computer Vision Applications

Object Detection using Vision Transformer and Deep Learning for Computer Vision Applications

📄 논문 정보

연관 논문 목록 (180건) 내 서재 담기

연관 논문 목록 (180건)