Yugi: An Adaptive Speech Based Virtual Tutor for Kids
Tech Stack: C, C++, Visual Studio Code 2010
- Extracted implicit and explicit features from both images and text using BERT and pre-trained YOLOv3 model.
- Applied multi-head cross-attention to effectively combine image and text encoding and passed to a fully connected layer for classification, achieving 82% F1 score.