Automatically Generating Commit Messages from Diffs using Neural Machine Translation

Automatically Generating Commit Messages from Diffs using Neural Machine Translation Siyuan Jiang, Ameer Armaly, and Collin McMillan University of Notre Dame, USA

Commit Messages 2

Commit Messages 3

Commit Messages Many commit messages are similar [1][2] [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 4

Commit Messages Many commit messages are similar [1][2] Remove unused images [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 5

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 6

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index Update mock images [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 7

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index Update mock images 2M commit messages [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 8

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index Update mock images 2M commit messages Neural Machine Translation (NMT) [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 9

Neural Machine Translation (NMT) Neural networks for translating natural languages, e.g. Chinese -> English * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 10

Neural Machine Translation (NMT) Neural networks for translating natural languages, e.g. Chinese -> English * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 11

Neural Machine Translation (NMT) Neural networks for translating natural languages, e.g. Chinese -> English Parallel Corpus News articles Biomedical articles * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 12

Neural Machine Translation (NMT) git-diff Neural networks for translating natural languages, e.g. Chinese -> English Parallel Corpus News articles Biomedical articles * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 13

Overview of Our Work diffs -> commit messages 15

Overview of Our Work diffs -> commit messages Filter 16

Overview of Our Work diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation 17

Overview of Our Work diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation Quality Assurance Filter Results 18

Overview of Our Work diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation Updated results Quality Assurance Filter Results 19

Preprocessing the Data Set 2M commit messages and diffs - 1K most popular Java projects in Github * [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 20

Preprocessing the Data Set 2M commit messages and diffs - 1K most popular Java projects in Github * 75K commit messages and diffs [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 22

Verb-Direct Object Filter Verb-Direct Object is a phrase type * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 23

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 24

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 25

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 26

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images 47% of commit messages are begun with this type of phrases * * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 27

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images 47% of commit messages began with this type of phrases * NLP Tool grammatical relations part-of-speech tags 32K commit messages and diffs * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 28

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images Testing: 3K Validation: 3K Training: 26K 47% of commit messages began with this type of phrases * NLP Tool grammatical relations part-of-speech tags 32K commit messages and diffs * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 29

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images NMT model: Nematus* Testing: 3K Validation: 3K Training: 26K 47% of commit messages began with this type of phrases * NLP Tool grammatical relations part-of-speech tags 32K commit messages and diffs * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 30

Evaluation Test Set References diff Commit Message Trained NMT model Generated Commit Message 31

Evaluation Test Set References diff Commit Message Similarity Trained NMT model Generated Commit Message 32

Evaluation Test Set References diff Trained NMT model Commit Message Generated Commit Message Similarity 1. An automatic metric 2. A human study 33

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 34

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences N 1 BLEU = BP exp( n=1 N log(p n)) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 35

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 36

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) Modified n-gram precision * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 37

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) Modified n-gram precision 4 (considers only 1 to 4-gram precisions) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 38

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) [0, 1] Modified n-gram precision 4 (considers only 1 to 4-gram precisions) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 39

BLEU Results Baseline: MOSES [1] Statistical machine translation system P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177 180. Association for Computational Linguistics, 2007. 40

BLEU Results Baseline: MOSES [1] Statistical machine translation system Model BLEU (%) p 1 p 2 p 3 p 4 MOSES 3.63 8.3 3.6 2.7 2.1 NMT 31.92 38.1 31.1 29.5 29.7 P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177 180. Association for Computational Linguistics, 2007. 41

BLEU Results Baseline: MOSES [1] Statistical machine translation system Most Diffs: 75 words Most Messages: < 30 words Model BLEU (%) p 1 p 2 p 3 p 4 MOSES 3.63 8.3 3.6 2.7 2.1 NMT 31.92 38.1 31.1 29.5 29.7 P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177 180. Association for Computational Linguistics, 2007. 42

Human Study BLEU Two sets of sentences Textual similarity 44

Human Study BLEU Two sets of sentences Textual similarity Human Study Individual sentences Semantic similarity 45

Human Study Survey 20 Programmers 46

Human Study Survey 20 Programmers 47

Human Study 983 pairs of generated/reference messages were rated: 226 pairs by three programmers 522 pairs by two programmers 235 pairs by one programmer 48

Human Study (semantic similarity: 0-no similarity, 7-identical) 49

Human Study 234 (semantic similarity: 0-no similarity, 7-identical) 50

Human Study 248 234 (semantic similarity: 0-no similarity, 7-identical) 51

Human Study 248 234 (semantic similarity: 0-no similarity, 7-identical) 52

Quality Assurance Filter Data: 983 commits that were evaluated in the human study 53

Quality Assurance Filter Data: 983 commits that were evaluated in the human study diff diff tf/idf Scores 0 or 1 Linear SVM (with SGD Training) tf/idf Trained Model Quality Assurance Filter or 54

Quality Assurance Filter 55

Quality Assurance Filter Detected 44% of the bad cases 56

Summary diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation Updated results Quality Assurance Filter Results 57

Summary diffs -> commit messages Neural Machine Translation Evaluation (NMT) Generate Filter short commit messages that are high-level overviews of software changes Updated results Quality Assurance Filter Results 58

On the Job Market Software Engineering, Program Comprehension Data Science Machine learning sjiang1@nd.edu 59