写在前面:跨媒体检索方向涵盖许多任务,涉及到图像、文本、语音、视频等多种模态的数据,事实上根据项目需求,开发者可以将所需的识别、分割、生成、编码等方法集成到检索或推荐项目中。
本文整理了在网络上能搜集到的Baidu、Youtube、Google、Facebook检索系统和大数据架构实现方案当做参考。

多模态信息检索的挑战和攻克方向

In fact, researchers and algorithm engineers in the field of information retrieval focus more on tasks such as data mining, feature representation, and analysis of user behavior. From the recent conferences like SIGIR and ACMMM, some research directions retrieved are as follows:
2021 SIGIR
Bias and counterfactual learning
Recommendation
Searching and Ranking
Social Aspects
Knowledge Structures
Question Answering
Sequences and Sessions
Adversarial Information Retrieval
Multi-modal Information Retrieval
MultiMedia Information Retrieval
Multi-modal Fusion and Embedding
2020 SIGIR

As noticed, the main modalities are visual, texual, and acoustic. The challanges lie on Multimodal Fusion. Many problems in engeneer often comes to: Infor mation loss, hierachical structure

transductive learning

optimal latent space, can maintance original intrinsic characteristics of microvideo in original space