Tree-to-tree Neural Networks for Program Translation

paper
- https://arxiv.org/abs/1802.03691
- Xinyun Chen, Chang Liu, Dawn Song
- NeulIPS 2018
github
データセット
project

f:id:e4exp:20200917191449p:plain

まとめ

どんなもの？
- プログラミング言語間の翻訳にdeepを使用した初の研究．encoder-decoder構造で，tree-LSTMでencodeした隠れ状態を元にLSTMベースのdecoder+attentionでnodeを生成．ソースとターゲットの対応する部分のsub treeが，modularな関係(対応関係にあるsub tree以外の部分は対応しない)になっていることを利用してattentionを拡張している
先行研究と比べてどこがすごい？
- 複数の実世界project code(java -> C#)でsota(非NN)を20.2-41.9%上回るprogram accuracyを達成
技術や手法のキモはどこ？
- 入出力となるparse treeは事前に2分木(Left-child Right-sibling)に変換しておく．
- encoderはTree-LSTMで葉ノードからボトムアップに隠れ状態を計算(2つの子ノードの状態を結合して親の状態を計算する)
- encoderの最後の隠れ状態をdecoderのrootで使用する．decoderはトップダウンに生成
  - 各nodeの値は非終端，終端，EOSのいずれか一つを推定する．(left-child right-siblingなので，非終端nodeであってもright-siblingが存在する可能性があるためEOSも必要)
  - left, rightの子ノードはそれぞれ別のLSTMを使う
  - decoderは展開するべきnode(EOSではないnode)をqueueに保持しておき，再帰的にそれらを展開する
  - decoderがnodeを展開するとき，source treeのどの非終端ノードと関係があるか考慮するためにattentionを用いる．sourceのnodeに属する隠れ状態と，展開中のノードに属する隠れ状態からattention重みを計算し，source treeの全nodeに関して期待値(重み付き和)を求め，埋め込みベクトルを得る
    - 展開中ノードが関係があるsource tree側のnodeとは，展開中ノードの子ノードも関係がある可能性が高いので，埋め込みベクトルは後続の子ノードの隠れ状態計算でも入力に使用する(parent attention feeding機構)
- 入出力データは，プログラム言語をparseしたparse treeをシリアライズしたtokenの系列．シリアライズはdepth first order．
どうやって有効だと検証した？
- 命令形言語 <-> 関数型言語，coffeescript <-> javascript，java <-> C#の3種類のタスク
- モデルはベースライン(seq2seq, seq2tree, tree2seq)およびsotaのプログラミング言語間翻訳手法複数との比較
- 指標はprogram accuracy(データセット内のプログラミング言語対データの内，正解となった割合．同じ内容を表現できる別のコードは不正解扱い)．(およびtoken accuracy)
議論はある？
- 特に，長いプログラムに対して性能が落ちないのが利点
- attentionをなくすとaccuracyがほとんど0%まで落ちる
- プログラムが長くなるほど，一次変数が導入されやすくなるが，訓練データに一次変数が殆ど入っていないので推定が難しい
- javascriptはcoffeescriptより表現が長くなる(js300 tokenに対してcofee 20 tokenなど)ので，js -> coffeeは簡単(モデル間の差が出にくい)だが逆は難しい(差が出やすい)
次に読むべき論文は？
- 1. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078, 2015.
  2. RNNは長さが増えるとsyntactically correctなプログラム生成が難しい
- 1. Dong and M. Lapata. Language to logical form with neural attention. In ACL, 2016.
  2. seq2tree．文法の学習とsequenceのalignを分離した手法．parse treeの同じ深さのnodeを生成するためにLSTM decoderを使い，非終端ノードを展開してparse tree中で子を生成する
- 1. 1. Kusner, B. Paige, and J. M. Hern ́ andez-Lobato. Grammar variational autoencoder. arXiv preprint arXiv:1703.01925, 2017.
  2. autoencoder baseのtree-to-tree translationモデル
- 1. Bradbury and R. Socher. Towards neural machine translation with latent tree attention. arXiv preprint arXiv:1709.01915, 2017.
  2. tree baseのattentionつきencoder-decoder．attentinon重みの計算はnodeごとに独立
- 1. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, and W.-Y. Ma. Dual learning for machine translation. In Advances in Neural Information Processing Systems, pages 820–828, 2016.
  2. NMTで，ペアになっていない2ドメイン間の変換を扱う手法
- 1. Aharoni and Y. Goldberg. Towards string-to-tree neural machine translation. In ACL, 2017.
  2. parsingを扱ったNN手法
- 1. Chen, C. Liu, and D. Song. Towards synthesizing complex programs from input-output examples. In ICLR, 2018.
  2. parsingを扱ったNN手法
その他

学んだことメモ

プログラミング言語間の翻訳，Tree-to-tree Neural Networks for Program Translation

Tree-to-tree Neural Networks for Program Translation

まとめ