Automatically Generating Commit Messages from Diffs using Neural Machine Translation

Similar documents
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

arxiv: v1 [cs.cl] 2 Apr 2017

The KIT-LIMSI Translation System for WMT 2014

Re-evaluating the Role of Bleu in Machine Translation Research

The NICT Translation System for IWSLT 2012

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

Initial approaches on Cross-Lingual Information Retrieval using Statistical Machine Translation on User Queries

TINE: A Metric to Assess MT Adequacy

Language Model and Grammar Extraction Variation in Machine Translation

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Noisy SMS Machine Translation in Low-Density Languages

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Regression for Sentence-Level MT Evaluation with Pseudo References

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A heuristic framework for pivot-based bilingual dictionary induction

Overview of the 3rd Workshop on Asian Translation

Detecting English-French Cognates Using Orthographic Edit Distance

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

A hybrid approach to translate Moroccan Arabic dialect

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

A Case Study: News Classification Based on Term Frequency

Using dialogue context to improve parsing performance in dialogue systems

Reducing Features to Improve Bug Prediction

3 Character-based KJ Translation

A Quantitative Method for Machine Translation Evaluation

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Beyond the Pipeline: Discrete Optimization in NLP

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Linking Task: Identifying authors and book titles in verbose queries

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Speech Emotion Recognition Using Support Vector Machine

Cross-lingual Short-Text Document Classification for Facebook Comments

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Constructing Parallel Corpus from Movie Subtitles

The stages of event extraction

Machine Translation on the Medical Domain: The Role of BLEU/NIST and METEOR in a Controlled Vocabulary Setting

The Ups and Downs of Preposition Error Detection in ESL Writing

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

TextGraphs: Graph-based algorithms for Natural Language Processing

Word Segmentation of Off-line Handwritten Documents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Residual Stacking of RNNs for Neural Machine Translation

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Semantic and Context-aware Linguistic Model for Bias Detection

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

HLTCOE at TREC 2013: Temporal Summarization

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Python Machine Learning

Memory-based grammatical error correction

Distant Supervised Relation Extraction with Wikipedia and Freebase

Bug triage in open source systems: a review

Human Emotion Recognition From Speech

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Experts Retrieval with Multiword-Enhanced Author Topic Model

A Comparison of Two Text Representations for Sentiment Analysis

Universiteit Leiden ICT in Business

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

The Role of String Similarity Metrics in Ontology Alignment

Multi-Lingual Text Leveling

Parsing of part-of-speech tagged Assamese Texts

AQUA: An Ontology-Driven Question Answering System

Cross Language Information Retrieval

Vocabulary Usage and Intelligibility in Learner Language

A study of speaker adaptation for DNN-based speech synthesis

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Multilingual Sentiment and Subjectivity Analysis

Ensemble Technique Utilization for Indonesian Dependency Parser

Top US Tech Talent for the Top China Tech Company

Probabilistic Latent Semantic Analysis

Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Extracting Verb Expressions Implying Negative Opinions

Finding Translations in Scanned Book Collections

BYLINE [Heng Ji, Computer Science Department, New York University,

CS Machine Learning

Using Semantic Relations to Refine Coreference Decisions

The Smart/Empire TIPSTER IR System

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Assignment 1: Predicting Amazon Review Ratings

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Training and evaluation of POS taggers on the French MULTITAG corpus

Dialog Act Classification Using N-Gram Algorithms

Generating Natural-Language Video Descriptions Using Text-Mined Knowledge

Welcome to. ECML/PKDD 2004 Community meeting

Matching Similarity for Keyword-Based Clustering

arxiv: v3 [cs.cl] 7 Feb 2017

Driving Author Engagement through IEEE Collabratec

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

(I couldn t find a Smartie Book) NEW Grade 5/6 Mathematics: (Number, Statistics and Probability) Title Smartie Mathematics

Transcription:

Automatically Generating Commit Messages from Diffs using Neural Machine Translation Siyuan Jiang, Ameer Armaly, and Collin McMillan University of Notre Dame, USA

Commit Messages 2

Commit Messages 3

Commit Messages Many commit messages are similar [1][2] [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 4

Commit Messages Many commit messages are similar [1][2] Remove unused images [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 5

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 6

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index Update mock images [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 7

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index Update mock images 2M commit messages [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 8

Commit Messages Many commit messages are similar [1][2] Remove unused images Add test back to index Update mock images 2M commit messages Neural Machine Translation (NMT) [1] A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance, pages 120 130, 2000. [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 9

Neural Machine Translation (NMT) Neural networks for translating natural languages, e.g. Chinese -> English * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 10

Neural Machine Translation (NMT) Neural networks for translating natural languages, e.g. Chinese -> English * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 11

Neural Machine Translation (NMT) Neural networks for translating natural languages, e.g. Chinese -> English Parallel Corpus News articles Biomedical articles * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 12

Neural Machine Translation (NMT) git-diff Neural networks for translating natural languages, e.g. Chinese -> English Parallel Corpus News articles Biomedical articles * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 13

Neural Machine Translation (NMT) git-diff Neural networks for translating natural languages, e.g. Chinese -> English Parallel Corpus News articles Biomedical articles * https://research.googleblog.com/2016/09/a-neural-network-for-machine.html 14

Overview of Our Work diffs -> commit messages 15

Overview of Our Work diffs -> commit messages Filter 16

Overview of Our Work diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation 17

Overview of Our Work diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation Quality Assurance Filter Results 18

Overview of Our Work diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation Updated results Quality Assurance Filter Results 19

Preprocessing the Data Set 2M commit messages and diffs - 1K most popular Java projects in Github * [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 20

Preprocessing the Data Set 2M commit messages and diffs - 1K most popular Java projects in Github * [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 21

Preprocessing the Data Set 2M commit messages and diffs - 1K most popular Java projects in Github * 75K commit messages and diffs [2] S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 22

Verb-Direct Object Filter Verb-Direct Object is a phrase type * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 23

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 24

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 25

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 26

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images 47% of commit messages are begun with this type of phrases * * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 27

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images 47% of commit messages began with this type of phrases * NLP Tool grammatical relations part-of-speech tags 32K commit messages and diffs * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 28

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images Testing: 3K Validation: 3K Training: 26K 47% of commit messages began with this type of phrases * NLP Tool grammatical relations part-of-speech tags 32K commit messages and diffs * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 29

Verb-Direct Object Filter Verb-Direct Object is a phrase type Remove unused images Add test back to index Update mock images NMT model: Nematus* Testing: 3K Validation: 3K Training: 26K 47% of commit messages began with this type of phrases * NLP Tool grammatical relations part-of-speech tags 32K commit messages and diffs * S. Jiang and C. McMillan. Towards automatic generation of short summaries of commits. In 2017 IEEE 25 th International Conference on Program Comprehension (ICPC), 2017. 30

Evaluation Test Set References diff Commit Message Trained NMT model Generated Commit Message 31

Evaluation Test Set References diff Commit Message Similarity Trained NMT model Generated Commit Message 32

Evaluation Test Set References diff Trained NMT model Commit Message Generated Commit Message Similarity 1. An automatic metric 2. A human study 33

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 34

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences N 1 BLEU = BP exp( n=1 N log(p n)) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 35

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 36

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) Modified n-gram precision * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 37

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) Modified n-gram precision 4 (considers only 1 to 4-gram precisions) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 38

BLEU: the Automatic Metric Bilingual Evaluation Understudy * A popular metric for measuring the similarity between two sentences BLEU = BP exp( Brevity Penalty N 1 n=1 N log(p n)) [0, 1] Modified n-gram precision 4 (considers only 1 to 4-gram precisions) * K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, pages 311 318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics. 39

BLEU Results Baseline: MOSES [1] Statistical machine translation system P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177 180. Association for Computational Linguistics, 2007. 40

BLEU Results Baseline: MOSES [1] Statistical machine translation system Model BLEU (%) p 1 p 2 p 3 p 4 MOSES 3.63 8.3 3.6 2.7 2.1 NMT 31.92 38.1 31.1 29.5 29.7 P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177 180. Association for Computational Linguistics, 2007. 41

BLEU Results Baseline: MOSES [1] Statistical machine translation system Most Diffs: 75 words Most Messages: < 30 words Model BLEU (%) p 1 p 2 p 3 p 4 MOSES 3.63 8.3 3.6 2.7 2.1 NMT 31.92 38.1 31.1 29.5 29.7 P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, pages 177 180. Association for Computational Linguistics, 2007. 42

Human Study BLEU Two sets of sentences Textual similarity 44

Human Study BLEU Two sets of sentences Textual similarity Human Study Individual sentences Semantic similarity 45

Human Study Survey 20 Programmers 46

Human Study Survey 20 Programmers 47

Human Study 983 pairs of generated/reference messages were rated: 226 pairs by three programmers 522 pairs by two programmers 235 pairs by one programmer 48

Human Study (semantic similarity: 0-no similarity, 7-identical) 49

Human Study 234 (semantic similarity: 0-no similarity, 7-identical) 50

Human Study 248 234 (semantic similarity: 0-no similarity, 7-identical) 51

Human Study 248 234 (semantic similarity: 0-no similarity, 7-identical) 52

Quality Assurance Filter Data: 983 commits that were evaluated in the human study 53

Quality Assurance Filter Data: 983 commits that were evaluated in the human study diff diff tf/idf Scores 0 or 1 Linear SVM (with SGD Training) tf/idf Trained Model Quality Assurance Filter or 54

Quality Assurance Filter 55

Quality Assurance Filter Detected 44% of the bad cases 56

Summary diffs -> commit messages Filter Neural Machine Translation (NMT) Evaluation Updated results Quality Assurance Filter Results 57

Summary diffs -> commit messages Neural Machine Translation Evaluation (NMT) Generate Filter short commit messages that are high-level overviews of software changes Updated results Quality Assurance Filter Results 58

On the Job Market Software Engineering, Program Comprehension Data Science Machine learning sjiang1@nd.edu 59