Attentionのみのモデルで翻訳タスク大幅改善, Transformer

Attention Is All You Need https://arxiv.org/abs/1706.03762 2017/06, NeurIPS 2017 RNNもCNNも使用せずattentionによりWMT2014英->独翻訳で28.4BLEU達成．前のsotaから2ポイント改善 RNNは自己回帰のため1サンプル内での並列化は不可能(かつ系列長が異な…

2021-12-29

長文に強い相対位置埋め込みを持つモデル RoFormer

DeepLearning Transformer 論文読み NLP

RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 2021/04 相対位置埋め込みを回転行列で表現したtransformer．各tokenに対する積として実行し，意味上は各tokenベクトルを回転させる効果になるトークン間…

2021-12-27

クロスモーダル事前学習不要のVQAモデル, Multimodal Bitransformer

DeepLearning Vision-Language Transformer

Supervised Multimodal Bitransformers for Classifying Images and Text https://arxiv.org/abs/1909.02950 2019 Architecture VQAにおいて，個別に事前学習済みの画像encoder, text encoderを組み合わせてBERTベースモデルでSAすることで，VilBERTのような…

2021-12-24

Jigsaw: 大規模言語モデルのコード生成に前/後処理を追加し精度改善

Jigsaw: Large Language Models meet Program Synthesis https://arxiv.org/abs/2112.02969 ICSE'22, 2021/12/06 大規模事前学習言語モデル(GPT-3, Codex．PTLMと呼ぶ)は自然言語からコード生成可能であるが，変数名変換とAST-to-AST変換による後処理モジュ…

2021-12-24

vision分野で多様な下流タスクに適用できる基礎モデルFlorence

DeepLearning Pre-Training Vision-Language Transformer

Florence: A New Foundation Model for Computer Vision 2021/11/22 https://arxiv.org/abs/2111.11432 Fig.2 Overview of building Florence 画像ドメインで多様な下流タスク(分類、検索、オブジェクト検出、VQA、画像キャプション、ビデオ検索、アクション…

2021-02-24

ニュース記事・画像からキャプション生成，Transform and Tell

Transform and Tell: Entity-Aware News Image Captioning paper https://arxiv.org/abs/2012.00364 Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao github https://github.com/alasd…

2021-02-08

物体検出結果のタグを利用して視覚-言語6タスクでSoTA更新，OSCAR

DeepLearning ImageCaptioning Transformer NLP Pre-Training

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks paper https://arxiv.org/abs/2004.06165 github https://github.com/microsoft/Oscar データセット COCO etc. project まとめどんなもの？言語embedding，画像の物体検出特徴に…

2021-02-08

任意クラスの分類器を生成できるzero-shot転移モデルCLIP

Learning Transferable Visual Models From Natural Language Supervision paper https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language.pdf github https://github.com/openai/CLIP データセット WebImageText(WIT) pr…

学んだことメモ

2021-01-01から1年間の記事一覧