TY - GEN
T1 - Shopee Price Match Guarantee Algorithm based on multimodal learning
AU - Fang, Yaxuan
AU - Wang, Junhan
AU - Jia, Lei
AU - Kin, Fung Wai
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/9/24
Y1 - 2021/9/24
N2 - Shopee has been a popular online shopping website in the Southeast Asia. Customers appreciate its easy, secure, and fast online shopping experience tailored to their region. At the same time, it allows customers to choose the one with the lower price of the same product. It relies on the product matching, that is the same product with the same description image must be removed. The base technology to achieve this function is multimodal learning, in which we focus on the images and text. In our article, we proposed a new multimodal learning model mainly based on transformer and BERT. For image matching, we use NFNet, Swin-Transformer and Efficientnet to get image embeddings. For text matching, we use Distil-Bert, Albert, Multilingual Bert and TF-IDF to get text embeddings. After we get the embedding vector, we choose KNN to classify. We use cosine and distance to measure the similarity of the different models. It is worth mentioning that the loss function is Arcface, not the traditional Softmax, which improve the difficulty of training to ensure the final performance in the test periods. In addition, 7 models vote for the final results ensuring the effect of prediction. To avoid the bad matching result, we add some postprocessing process.
AB - Shopee has been a popular online shopping website in the Southeast Asia. Customers appreciate its easy, secure, and fast online shopping experience tailored to their region. At the same time, it allows customers to choose the one with the lower price of the same product. It relies on the product matching, that is the same product with the same description image must be removed. The base technology to achieve this function is multimodal learning, in which we focus on the images and text. In our article, we proposed a new multimodal learning model mainly based on transformer and BERT. For image matching, we use NFNet, Swin-Transformer and Efficientnet to get image embeddings. For text matching, we use Distil-Bert, Albert, Multilingual Bert and TF-IDF to get text embeddings. After we get the embedding vector, we choose KNN to classify. We use cosine and distance to measure the similarity of the different models. It is worth mentioning that the loss function is Arcface, not the traditional Softmax, which improve the difficulty of training to ensure the final performance in the test periods. In addition, 7 models vote for the final results ensuring the effect of prediction. To avoid the bad matching result, we add some postprocessing process.
KW - Arcface
KW - multimodal learning
KW - post-processing
KW - Shopee
KW - voting
UR - http://www.scopus.com/inward/record.url?scp=85118957324&partnerID=8YFLogxK
U2 - 10.1109/CEI52496.2021.9574565
DO - 10.1109/CEI52496.2021.9574565
M3 - Conference Proceeding
AN - SCOPUS:85118957324
T3 - 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology, CEI 2021
SP - 59
EP - 62
BT - 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology, CEI 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology, CEI 2021
Y2 - 24 September 2021 through 26 September 2021
ER -