Memes Classification Project

Meme analysis has become the essential research topic. It contains both graphic representation and embedded text description, and often carries more than one message spreading emotional influences on every single individual who read that meme. In this project, I propose the Memotion Multimodal Model (M4 Model) for humor detection on memotion dataset. [Codes]

project1_overview

Details

  • The main task of this project is multimodal binary classification of memotion dataset [1].
  • Given the visual and textual information of each meme, the target is to know whether the given meme is humorous or non-humorous.
  • Memotion dataset consists of 6,992 and 2,000 memes of training and test set.


  • There are three highlights of contributions of M4 model.
  • Visuo-Lingual Extractor: Uses VGG16 to learn visual feature, and ALBERT transformer to extract textual feature of embeded text of an image.
  • Multimodal Feature Fusion: Applies Gated Multimodal Layer (GML) [3] to combine two sets of features vectors.
  • Pretrained Transformer: Incorporates a transfer-learning technique to firstly train a ALBERT model by text dataset [2], and migrate it to M4.



  • Uses Huggingface library [4] for developing most of the componts, e.g., dataset, model network, training and test process.
  • Designs different methods for multimodal fusion.
  • Provides both the pretraining of ALBERT and transfer-learning of M4 in our code.