It Will Never Work in Theory

Software development research that is relevant in practice

Browsing Posts published by gvwilson

Sihan Li, Hucheng Zhou, Haoxiang Lin, Tian Xiao, Haibo Lin, Wei Lin, and Tao Xie: “A Characteristic Study on Failures of Production Distributed Data-Parallel Programs“. Proc. ICSE 2013.

SCOPE is adopted by thousands of developers from tens of different product teams in Microsoft Bing for daily web-scale data processing, including index building, search ranking, and advertisement display. A SCOPE job is composed of declarative SQL-like queries and imperative C# user-defined functions (UDFs), which are executed in pipeline by thousands of machines. There are tens of thousands of SCOPE jobs executed on Microsoft clusters per day, while some of them fail after a long execution time and thus waste tremendous resources. Reducing SCOPE failures would save significant resources.

This paper presents a comprehensive characteristic study on 200 SCOPE failures/fixes and 50 SCOPE failures with debugging statistics from Microsoft Bing, investigating not only major failure types, failure sources, and fixes, but also current debugging practice. Our major findings include (1) most of the failures (84.5%) are caused by defects in data processing rather than defects in code logic; (2) table-level failures (22.5%) are mainly caused by programmers’ mistakes and frequent data-schema changes while row-level failures (62%) are mainly caused by exceptional data; (3) 93% of fixes do not change data processing logic; (4) there are 8% failures with root cause not at the failure-exposing stage, making current debugging practice insufficient in this case. Our study results provide valuable guidelines for future development of data-parallel programs. We believe that these guidelines are not limited to SCOPE, but can also be generalized to other similar data-parallel platforms.

This insightful little paper is a great example of how a small investment in studying how things actually work can beneficially impact where developers focus their effort. A similar fault analysis of (for example) the way novices do joins in SQL, or of how they get list comprehensions in Python wrong, would be very welcome. We look forward to seeing how the team at Microsoft incorporates these findings into the next version of SCOPE.

A team of Mozilla developers ran a Reddit “Ask Me Anything” on Firefox two weeks ago. Several thousand comments were submitted, and Blake Winton has now sorted and classified them. It seems like it would be a useful data set for someone doing empirical software engineering research; if you’d like to have a look, please get in touch.

Leo Porter, Cynthia Bailey-Lee, and Beth Simon: “Halving Fail Rates using Peer Instruction: A Study of Four Computer Science Courses“. Proc. SIGCSE 2013.

Peer Instruction (PI) is a teaching method that supports student- centric classrooms, where students construct their own understanding through a structured approach featuring questions with peer discussions. PI has been shown to increase learning in STEM disciplines such as physics and biology. In this report we look at another indicator of student success – the rate at which students pass the course or, conversely, the rate at which they fail. Evaluating 10 years of instruction of 4 different courses spanning 16 PI course instances, we find that adoption of the PI methodology in the classroom reduces fail rates by a per-course average of 61% (20% reduced to 7%) compared to Standard Instruction (SI). Moreover, we also find statistically significant improvements within-instructor. For the same instructor teaching the same course, we find PI decreases the fail rate, on average, by 67% (from 23% to 8%) compared to SI. As an in-situ study, we discuss the various threats to the validity of this work and consider implications of wide-spread adoption of PI in computing programs.

This paper, which has just been presented at SIGCSE 2013 in Denver, may be one of the most significant empirical results we’ve ever reported. As the abstract says, a specific teaching technique can cut the failure rate in introductory classes by more than half; it also increases self-reported learner satisfaction. To find out more, check out http://www.peerinstruction4cs.org/.

10th Working Conference on Mining Software Repositories
May 18-19, 2013. San Francisco, CA, USA
http://2013.msrconf.org

Co-located with the 35th ACM/IEEE International Conference on Software Engineering (ICSE 2013)
Sponsored by IEEE TCSE and ACM SIGSOFT

NEW IN 2013!

  • Data papers for describing data sets curated by their authors and making them available to the research community
  • Practice papers for experiences of applying mining repository algorithms in an industry/open source organization context
  • Microsoft Surface tablet with Windows RT as prize for the best Mining Challenge, sponsored by Microsoft Research.

IMPORTANT DATES

  • Research/Practice abstracts: Feb 8, 2013
  • Research/Practice papers: Feb 15, 2013
  • Data papers: Feb 15, 2013
  • Challenge papers: Mar 04, 2013
  • Author notification: Mar 15, 2013
  • Camera-ready copy: Mar 29, 2013
  • Conference: May 18-19, 2013

All submission deadlines are 11:59 PM (Pago Pago, American Samoa) on the dates indicated.

CALL FOR PAPERS

Software repositories such as source control systems, archived communications between project personnel, and defect tracking systems are used to help manage the progress of software projects. Software practitioners and researchers are recognizing the benefits of mining this information to support the maintenance of software systems, improve software design/reuse, and empirically validate novel ideas and techniques. Research is now proceeding to uncover the ways in which mining these repositories can help to understand software development and software evolution, to support predictions about software development, and to exploit this knowledge concretely in planning future development. The goal of this two-day working conference is to advance the science and practice of software engineering via the analysis of data stored in software repositories.

This year, we will solicit three tracks of papers: Research, Practice, and Data. As in previous MSR editions, there will be a Mining Challenge on Stack Overflow data and a special issue of best MSR papers in the Empirical Software Engineering journal.

Research papers: Research papers can be short papers (4 pages) and full papers (10 pages). Short research papers should discuss controversial issues in the field, or describe interesting or thought provoking ideas that are not yet fully developed. Accepted short papers will present their ideas in a short lightning talk. Full research papers are expected to describe new research results, and have a higher degree of technical rigor than short papers.

Practice papers: (New!) Practice papers should report experiences of applying mining repository algorithms in an industry/open source organization context. Practice papers aim at reporting positive or negative experiences of applying known algorithms, but adapting existing algorithms or proposing new algorithms for practical use would be plus. Practice papers also can be short papers (4 pages) and full papers (10 pages).

Data papers: (New!) We want to encourage researchers to share their data. Data papers should describe data sets curated by their authors and made available to others. They are expected to be at most 4 pages long and should address the following: description of the data, including its source; methodology used to gather it; description of the schema used to store it, and any limitations and/or challenges of this data set. The data should be made available at the time of submission of the paper for review, but will be considered confidential until publication of the paper. Further details about data papers are available on the conference website.

Mining challenge: In the Mining Challenge, we invite researchers to demonstrate the usefulness of their mining tools on preselected software repositories and summarize their findings in a challenge report (4 pages). Please visit our Challenge Web Site for details about the Mining Challenge. This year, the challenge is on the Stack Overflow data. We provide the dump for the Stack Overflow web service and you should use your brain, tools, computational power, and magic to uncover interesting findings related to it.

EMSE SPECIAL ISSUE

A selection of the best research papers will be invited for consideration in a special issue of the journal, Empirical Software Engineering (EMSE), edited by Springer.

TOPICS

Papers may address issues along the general themes, including but not limited to the following:

  • Analysis of software ecosystems and mining of repositories across multiple projects
  • Models for social and development processes that occur in large software projects
  • Prediction of future software qualities via analysis of software repositories
  • Models of software project evolution based on historical repository data
  • Characterization, classification, and prediction of software defects based on analysis of software repositories
  • Techniques to model reliability and defect occurrences
  • Search-driven software development, including search techniques to assist developers in finding suitable components and code fragments for reuse, and software search engines
  • Analysis of change patterns and trends to assist in future development
  • Visualization techniques and models of mined data
  • Techniques and tools for capturing new forms of data for storage in software repositories, such as effort data, fine-grained changes, and refactoring
  • Characterization of bias in mining and guidelines to ensure quality results
  • Privacy and ethics in mining software repositories
  • Meta-models, exchange formats, and infrastructure tools to facilitate the sharing of extracted data and to encourage reuse and repeatability
  • Empirical studies on extracting data from repositories of large long-lived and/or industrial projects
  • Methods of integrating mined data from various historical sources
  • Approaches, applications, and tools for software repository mining
  • Mining software licensing and copyrights
  • Mining execution traces and logs
  • Analysis of natural language artifacts in software repositories

SUBMISSION

All papers must conform at time of submission to the ICSE/MSR 2013 Formatting Instructions and must not exceed the page limits (research/practice papers: 10 pages; short papers: 4 pages; data papers: 4 pages; challenge reports: 4 pages), including all text, references, appendices and figures. All submissions must be in English and in PDF format.

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission.

Papers must be submitted electronically through EasyChair using the following URL: http://easychair.org/conferences/?conf=msr2013

Upon notification of acceptance, all authors of accepted papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR 2013 conference. All accepted contributions will be published in the conference electronic proceedings.

Our previous post, “Empirical Evidence for the Value of Version Control“, generated a lot of comments. Many sought to explain why version control is helpful, but that’s not what we were looking for: we were looking for empirical evidence that it is. To see why we need it, take a look at this response from Jordi Cabot [1]. In it, he says:

Quite regularly, I get questions about what empirical evidence supports my “belief” that models are good… Until now, I used to point to the (true, few) scientific empirical studies on the effectiveness of software modeling…but now I have an even anser to give you: “Empirical Evidence of the Value of Version Control”.
No, I haven’t lost my mind. The point of this link is to show you that there’s no proof that version control is better for software development, and yet, I don’t think any of you would argue against it.
Same for modeling and model-driven engineering. It would be great to have more proof but the absence of proof alone should not be used against it unless you want to start also abandoning other unproven things like version control.

He’s right: if we’re willing to accept that version control is valuable, without proof, then we can hardly require advocates of modeling to prove their case. Or advocates of functional programming, or literate programming, or Hungarian notation. Heck, if we don’t require proof for our claims, then we’re honor-bound to accept that Perl is “intuitive” because its grammar has as many special cases and contradictions as the grammars of natural languages, aren’t we? Or that learning Befunge makes you a better programmer (seriously, I’ve heard that claim too).

At some point, the statement, “If we don’t need to prove the value of version control, we don’t need to prove the value of X” becomes absurd. However, everyone’s threshold of absurdity is different. I personally don’t think that modeling adds value for most developers in most situations—I think that if it did, or if its benefits really were as significant as its advocates claim, more developers would have adopted it by now—but I don’t know. What I do know is, if we can’t demonstrate the value of something that most of us believe in, like version control, what chance do we have of telling whether other practices, like modeling and test-driven development, are worth adopting (or rather, when they’re worth adopting and by whom, since I doubt there’s a one-size-fits-all answer)?

So here are my requests:

  1. Tell us what kind of study would convince you that using Befunge didn’t make programmers more productive.
  2. Then tell us what kind of study would convince you that version control didn’t either.

If your answer to the second question is is, “Nothing ever could,” then version control is an article of faith for you, and there’s no point arguing further [2]. If your answer to the second is different from your answer to the first, please tell us why.

[1] Full disclosure: Jordi and I co-authored a study of web-based software project portals. And either way, we hope you have a happy and productive 2013.

[2] This request is inspired by Karl Popper’s notion of falsifiability: a claim is only scientific if there is some way to prove it wrong.

We received this by email:

I use version control for my software, and I encourage others to do so, but I have no experimental evidence to base that decision. I pulled out my old copy of Code Complete (it’s a first edition), and the only reference it makes is to “Moore 1992″, which is a private communication that says that Microsoft considers their internal use of version control to be a competitive advantage.

The common practices I know of are:

  1. no version control
  2. every once in a while make a backup, either as a tar/zip file or copy everything into a new directory
  3. use filesystem versioning, like what was on a VAX, or Time Machine on a Mac, or Dropbox for a distributed multi-version file system
  4. - use a version control system; though this in turn can vary from SCCS and RCS to Fossil and Veracity

In addition, there’s a difference between the needs of a single developer vs. a small team, vs. a large, distributed team.

Is there published experimental evidence showing that a version control system is more useful than, say, developing using Dropbox? I tried looking for the relevant papers but I don’t know how to search that field and I couldn’t find anything.

It’s a good question—does anyone have an answer?

Jorge Aranda and I submitted a short opinion piece to Communications of the ACM in February 2012 that discussed some of the reasons people in industry and academia don’t talk to each other as much as they should. Ten months later, it has ironically turned into an illustration of one of the reasons: it was six months before we received any feedback at all, and we’ve now waited four months for any further word. In that time, Jorge has left academia and I’ve taken a job with Mozilla, so we have decided to withdraw the manuscript and publish it on my personal blog. We hope you find it interesting, and we would welcome comments.

Many people have noted the wide gulf between the people who study software development and the people who do it. One person trying to close that gap is Michael Feathers, who is running a one-day workshop in London on Wednesday, January 16 titled “Developing Project Guidance Through Code History Mining“. Feathers is the author of the landmark book Working Effectively With Legacy Code, and is actively seeking to build ties with people who have similar interests.

Our recommendation: two thumbs up.

As we reported a few days ago, one of our contributors, Greg Wilson, gave a keynote at the MSR Vision 2020 workshop in Kingston on August 20. In that, he explored why there’s still a gulf between software engineering researchers and the people who actually build software for a living (see the slides or the discussion on Reddit for details). He also said that:

  1. there’s no easy way to close that gap, because most of the people in industry that researchers want to collaborate with have never encountered empirical software engineering studies, and therefore don’t understand their scope or value; so
  2. researchers—many of whom are professors—should pivot the software engineering classes they teach to focus on how to analyze real-world data, and what past analyses have told us, so that the next generation of developers will understand (and listen, and want to collaborate).

To make this more concrete, Greg asked the workshop participants to make up some assignments and exam questions for such a course. Some of the suggestions are listed below; we would welcome other ideas as well (please post them as comments). We’d also like to know who’d be interested in trying to teach such a course at their institution, and what you think the prerequisites would have to be: statistics, obviously, but would a database course that introduced students to SQL be necessary? What about a natural language processing course? Or something else we haven’t thought of?

Group 1

Give two examples of success stories in studies of the social aspects of software engineering.
  1. Reorganization based on social structures
  2. Identifying the “big players” in a software project
What are three sources of social interaction in software projects?
  1. Email
  2. IRC
  3. bug comments
  4. source code comments
Name three challenges in preprocessing emails.
  1. signatures
  2. code snippets
  3. stack traces
  4. fake/multiple email addresses
  5. identifying email headers and inline replies
  6. typos
  7. chat acronyms
  8. non-native speakers
  9. use of multiple languages

Group 2

  1. You are given a dataset A of OSS projects and a subset of it B. Evaluate whether a hypothesis H can be rejected on A and B. Design the question in such a way that H is significant (at 0.05 level) at A and not B. Discuss the discrepancy.
  2. Given a dataset and a specific question, perhaps from exisitng MSR papers, discuss which data mining approach is best suited for that question.
  3. Given a specific question (e.g., bug finding) what repositories should you use to solve it? Illustrate it with Bugzilla. How do you adapt this to Jira?
  4. Given that two variables A and B correlate, can you say “A causes B”? Why or why not?
  5. Repeat an existing analysis from an MSR paper. Do you get the same results? Vary a number of variables. How different are the results?

Group 3

  • Statistics
    1. What is wrong with this claim: “Files with a large number of committers/authors have more defects/bugs, so we conclude that more authors cause more bugs, and we recommended that the number of commiters be reduced.”
    2. A tool is 99% accurate in detecting defective lines of code. Should developers use the tool? Why or why not?
    3. What are the internal validity issues and external validity issues with this method? “Researcher X finds that a lack of modularity leads to more defects in Windows, and Y is going to apply that predict defects in Eclipse.”
    4. Design a study to see whether people who go to lunch together have fewer build defects in their software.
    5. Which would product fewer false positives: 90% recall and 10% precision, or 10% precision and 90% recall?
  • Data
    1. Given a table of bug reports with severity, etc. and another table of users with qualifications, etc., determine whether experience and bug report frequency are correlated, and if so, how strongly.
    2. Define: evolutionary coupling, tokenizing, word nets, stemming, n-gram, entropy.
    3. List 10 sources of data that could be mined to estimate the risks to a software projects, and describe the limitations of each.
  • Interpretation and Actionability
    1. Your boss has asked you to generate documentation for a legacy system that doesn’t have any. What approach(es) would you use to automatically generate some useful documentation for each class and method?
    2. Given a set of version control logs, how would you tell which commits were bug fixes (vs. adding new features)?
    3. What technique(s) would you use to correlate email messages from a mailing list archive with related version control commits?
  • Ethics
    1. Given a data set (mailing list archive, bug reports, and version control log), anonymize it so that it can be shared without risk.
    2. Is it ethical to do an experiment to find out whether one race or gender produces more bugs than another? Justify your answer. How about graduates of one university vs. another?

I gave the opening talk at MSR Vision 2020 in Kingston on Monday (slides), and in the wake of that, an experienced developers at Mozilla sent me a list of ten questions he’d really like empirical software engineering researchers to answer.  They’re interesting in their own right, but I think they also reveal a lot about what practitioners want from researchers in general; comments would be very welcome.

  1. Vi vs. Emacs vs. graphica editors/IDEs: which makes me more productive?
  2. Should language developers spend their time on tools, syntax, library, or something else (like speed)? What makes the most difference to their users?
  3. Do unit tests save more time in debugging than they take to write/run/keep updated?
  4. Do distribution version control systems offer any advantages over centralized version control systems? (As a sub-question, Git or Mercurial: which helps me make fewer mistakes/shows me the info I need faster?)
  5. What are the best debugging techniques?
  6. Is it really twice as hard to debug as it is to write the code in the first place?
  7. What are the differences (bug count, code complexity, size, etc.), if any, between community-driven open source projects and corporate-controlled open source projects?
  8. If 10,000-line projects don’t benefit from architecture, but 100,000-line projects do, what do you do when your project slowly grows from the first size to the second?
  9. When does it make sense to reinvent the wheel vs. use an existing library?
  10. Are conferences worth the money? How much do they help junior/intermediate/senior programmers?