COMP 762: ML & NLP Methods for Software Engineering

 

Course Information:

Instructor:                 Jin Guo

Class Time:                MW 10:05 AM - 11:25 AM

Location:                   McConnell Engineering Building 103

Online Discussion:   Piazza

 

Description:

Modern software engineering projects produce large amount of data including use cases, specifications, source code, test cases, etc. With effective Machine Learning (ML) and Natural Language Processing (NLP) techniques, these data can be utilized to support a variety of software engineering activities. This course aims to introduce students to cutting-edge research topics that utilize ML and NLP techniques to provide automated or semi-automated support for SE tasks. The course will focus on discussing seminal and state-of-the-art papers that are published in SE conferences and journals covering topics such as code completion, feature location, trace link evaluation, etc. A variety of ML and NLP techniques are utilized in those works, such as association rule mining, topic modeling, natural langue semantic parsing, language model and deep neural network.  Student will read assigned papers ahead of time and write paper reports. During class, students will present the paper and participate in in-depth discussion. Students will also carry out a final project that either extend previous literatures or explore new directions of using the relevant ML and NLP techniques and tools for SE needs.  

 

Outcome:

The students who successfully finished this course should be able to:

·      Have the knowledge of ML and NLP techniques that are frequently adopted in SE research. Understand the strengths and constrains of different techniques for solving specific SE problems.

·      Critically read scientific literatures. Identify and articulate their context, techniques, findings, contributions and limitations.

·      Find and formulate good SE problems. Identify and implement solutions that potentially solve or mitigate those problems effectively.

·      Clearly summarize and communicate research findings through written reports and presentations.

 

Paper Reading Report

Before in-class discussion, students will read the assigned papers, annotate the papers while they are reading, and write reports to summarize the content in papers. The reading report should cover the following points:

·      Motivation of this work.

·      What are the assumptions this paper makes?

·      What is the proposed solution?

·      How is solution evaluated?

·      What elements might threaten the validity of this work?

·      Limitations or extensions for this work

·      Major takeaway message

 

Paper Presentation

From Week 3, there will be one to two students giving a 20 minutes presentation each to the assigned paper or related papers. The presenter of each class is decided in a first-come-first-serve way. Please sign up using the link that will be sent to your McGill email address.

 

Course Project

Student will work alone or in a team for the final project. For team project, student need to clearly define which person did which parts of the work and each person should contribute equally.

There are several phases of the project along the semester:

1.    Ideation phase: post the project ideas on Piazza, discuss with the class member there and refine the ideas. 

2.    Write the proposal report and present. The proposal report should be less than 2 pages and cover the basic ideas of what SE problem you intend to solve, why you think it's important, and how you plan to evaluate it. The proposal presentation will be 10 minutes following with discussions on week 7.

3.    Write the final project report and submit the artifacts. The final report should be 4-8 pages, and follow the structure of the SE literature, including sections of project goals, concrete methods, evaluation strategy (what are the measures, baselines and why) and conclusions. The final project presentation will be 15 minutes on week 13 and 14. Artifact submission is optional and will give you bonus credit. 

 

Grading

·      Paper reading report [15%]

·      Participation (survey, discussion, and feedback in-class and on Piazza) [25%]

·      Presentation (related paper presentation, proposal presentation, final project presentation) [20%]

·      Project Report (proposal, peer review, final report) [40%]

·      Bonus (recognized contribution on piazza, artifact submission with course project) [5%]

 

Schedule

Below is a tentative schedule of the papers to read and discuss. The date is subject to minor modifications.

 

 

Date

Required Reading

Optional Reading

W1

4-Sep

W2-a

9-Sep

How to Read an Engineering Research Paper

Writing Good Software Engineering Research Papers

W2-b

11-Sep

Who should fix this bug

Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

W3-a

16-Sep

Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code

Feature location in source code: a taxonomy and survey

W3-b

18-Sep

An Evaluation of Constituency-based Hyponymy Extraction from Privacy Policies

The role of natural language in requirements engineering

W4-a

23-Sep

Automated Extraction of Conceptual Models from User Stories via NLP

The Crowd in Requirements Engineering

W4-b

25-Sep

Towards an intelligent domain-specific traceability solution

Advancing candidate link generation for requirements tracing: The study of methods

W5-a

30-Sep

Bug Report, Feature Request, or Simply Praise? On Automatically Classifying App Review

 

W5-b

02-Oct

Automating Intention Mining

Detecting speech act types in developer question/answer conversations during bug repair

W6-a

07-Oct

Mining Metrics to Predict Component Failures

W6-b

09-Oct

Towards Building a Universal Defect Prediction Model

 

W7-a

14-Oct

Thanksgiving - no class

 

W7-b

16-Oct

Project Progress Report

 

W8-a

21-Oct

Project Progress Report

 

W8-b

23-Oct

Discovering Information Explaining API Types Using Text Classification

Augmenting API documentation with insights from stack overflow

W9-a

28-Oct

Automatically Assessing Code Understandability: How Far Are We?

Improving code readability models with textual features

W9-b

30-Oct

On the naturalness of software

Natural software revisited

W10-a

04-Nov

A Convolutional Attention Network for Extreme Summarization of Source Code

Deep Code Comment Generation

W10-b

06-Nov

Automatically generating commit messages from diffs using neural machine translation

A survey of machine learning for big code and naturalness

W11-a

11-Nov

Guest Lecture (TBD)

 

W11-b

13-Nov

Guest Lecture (TBD)

 

W12-a

18-Nov

Are Deep Neural Networks the Best Choice for Modeling Source Code

Deep Learning Similarities from Different Representations of Source Code

W12-b

20-Nov

Easy over Hard: A Case Study on Deep Learning

Predicting semantically linkable knowledge in developer online forums via convolutional neural network

W13-a

25-Nov

Is "better data" better than "better data miners"?: on the benefits of tuning SMOTE for defect prediction

W13-b

27-Nov

Project Presentation and Discussion

 

W14-a

2-Dec

Project Presentation and Discussion

W14-b

3-Dec

Wrap up