Yugi: An Adaptive Speech Based Virtual Tutor for Kids

Tech Stack: C, C++, Visual Studio Code 2010

Github URL: Project Link

  • Extracted implicit and explicit features from both images and text using BERT and pre-trained YOLOv3 model.
  • Applied multi-head cross-attention to effectively combine image and text encoding and passed to a fully connected layer for classification, achieving 82% F1 score.