It Will Never Work in Theory

Software development research that is relevant in practice

Sihan Li, Hucheng Zhou, Haoxiang Lin, Tian Xiao, Haibo Lin, Wei Lin, and Tao Xie: “A Characteristic Study on Failures of Production Distributed Data-Parallel Programs“. Proc. ICSE 2013.

SCOPE is adopted by thousands of developers from tens of different product teams in Microsoft Bing for daily web-scale data processing, including index building, search ranking, and advertisement display. A SCOPE job is composed of declarative SQL-like queries and imperative C# user-defined functions (UDFs), which are executed in pipeline by thousands of machines. There are tens of thousands of SCOPE jobs executed on Microsoft clusters per day, while some of them fail after a long execution time and thus waste tremendous resources. Reducing SCOPE failures would save significant resources.

This paper presents a comprehensive characteristic study on 200 SCOPE failures/fixes and 50 SCOPE failures with debugging statistics from Microsoft Bing, investigating not only major failure types, failure sources, and fixes, but also current debugging practice. Our major findings include (1) most of the failures (84.5%) are caused by defects in data processing rather than defects in code logic; (2) table-level failures (22.5%) are mainly caused by programmers’ mistakes and frequent data-schema changes while row-level failures (62%) are mainly caused by exceptional data; (3) 93% of fixes do not change data processing logic; (4) there are 8% failures with root cause not at the failure-exposing stage, making current debugging practice insufficient in this case. Our study results provide valuable guidelines for future development of data-parallel programs. We believe that these guidelines are not limited to SCOPE, but can also be generalized to other similar data-parallel platforms.

This insightful little paper is a great example of how a small investment in studying how things actually work can beneficially impact where developers focus their effort. A similar fault analysis of (for example) the way novices do joins in SQL, or of how they get list comprehensions in Python wrong, would be very welcome. We look forward to seeing how the team at Microsoft incorporates these findings into the next version of SCOPE.

A team of Mozilla developers ran a Reddit “Ask Me Anything” on Firefox two weeks ago. Several thousand comments were submitted, and Blake Winton has now sorted and classified them. It seems like it would be a useful data set for someone doing empirical software engineering research; if you’d like to have a look, please get in touch.

Leo Porter, Cynthia Bailey-Lee, and Beth Simon: “Halving Fail Rates using Peer Instruction: A Study of Four Computer Science Courses“. Proc. SIGCSE 2013.

Peer Instruction (PI) is a teaching method that supports student- centric classrooms, where students construct their own understanding through a structured approach featuring questions with peer discussions. PI has been shown to increase learning in STEM disciplines such as physics and biology. In this report we look at another indicator of student success – the rate at which students pass the course or, conversely, the rate at which they fail. Evaluating 10 years of instruction of 4 different courses spanning 16 PI course instances, we find that adoption of the PI methodology in the classroom reduces fail rates by a per-course average of 61% (20% reduced to 7%) compared to Standard Instruction (SI). Moreover, we also find statistically significant improvements within-instructor. For the same instructor teaching the same course, we find PI decreases the fail rate, on average, by 67% (from 23% to 8%) compared to SI. As an in-situ study, we discuss the various threats to the validity of this work and consider implications of wide-spread adoption of PI in computing programs.

This paper, which has just been presented at SIGCSE 2013 in Denver, may be one of the most significant empirical results we’ve ever reported. As the abstract says, a specific teaching technique can cut the failure rate in introductory classes by more than half; it also increases self-reported learner satisfaction. To find out more, check out http://www.peerinstruction4cs.org/.

Mel Ó Cinnéide, Laurence Tratt, Mark Harman, Steve Counsell, and Iman Hemati Moghadam, Experimental Assessment of Software Metrics Using Automated Refactoring, ESEM ’12, Lund, Sweden.

The impact and applicability of software metrics continues to be a subject of debate, especially since there are many metrics that measure similar properties, like cohesion. This raises the question of the extent to which these metrics agree or not.

The interesting idea that this paper proposes is to not only analyze the agreement and disagreement of metrics, but to also investigate how the metrics change on refactored versions of the same code. The authors do so by randomly applying automated refactorings to a code base and observing how these refactorings impact the metrics. By running these automated refactoring analysis, the authors want to distinguish between what they call volatile metrics, those that are easily impacted, and inert metrics that hardly change under refactoring. Furthermore, they want to know what metrics change in relation with one another, are the refactorings that cause one metric to increase, while another (supposedly measuring a similar property) decreases.

They applied their method to 300KLOC of Java code of 8 open source systems and investigated the following five metrics:

  • Tight Class Cohesion(TCC)
  • Lack of Cohesion between Methods (LCOM5)
  • Class Cohesion (CC)
  • Sensitive Class Cohesion (SCOM)
  • Low-level Similarity Base Class Cohesion. (LSCC)

Their evaluation shows that LSCC, CC and LCOM5 are all highly volatile metrics: in 99% of the refactorings, these were either increased or decreased. The results, however, were different for the 8 systems under consideration. In one case, for example, all metrics turned out to be volatile. Even when normalizing for relative volatility, the variance remained high.

In a second evaluation, the relationship between two of the cohesion metrics, LSCC and TCC, is explored in more detail. Refatorings where one of those two metrics is lowered, while the other is increased are studied in more detail.

What makes this work so interesting, apart from the cool originality of applying automated refactoring in the context of metrics, is the fact that it changes our perception of metrics. Where we previously assumed that different metrics for cohesion were mainly a matter of taste (and hence debate), this papers finds that metrics can not only differ, but that they can be conflicting in many cases.

10th Working Conference on Mining Software Repositories
May 18-19, 2013. San Francisco, CA, USA
http://2013.msrconf.org

Co-located with the 35th ACM/IEEE International Conference on Software Engineering (ICSE 2013)
Sponsored by IEEE TCSE and ACM SIGSOFT

NEW IN 2013!

  • Data papers for describing data sets curated by their authors and making them available to the research community
  • Practice papers for experiences of applying mining repository algorithms in an industry/open source organization context
  • Microsoft Surface tablet with Windows RT as prize for the best Mining Challenge, sponsored by Microsoft Research.

IMPORTANT DATES

  • Research/Practice abstracts: Feb 8, 2013
  • Research/Practice papers: Feb 15, 2013
  • Data papers: Feb 15, 2013
  • Challenge papers: Mar 04, 2013
  • Author notification: Mar 15, 2013
  • Camera-ready copy: Mar 29, 2013
  • Conference: May 18-19, 2013

All submission deadlines are 11:59 PM (Pago Pago, American Samoa) on the dates indicated.

CALL FOR PAPERS

Software repositories such as source control systems, archived communications between project personnel, and defect tracking systems are used to help manage the progress of software projects. Software practitioners and researchers are recognizing the benefits of mining this information to support the maintenance of software systems, improve software design/reuse, and empirically validate novel ideas and techniques. Research is now proceeding to uncover the ways in which mining these repositories can help to understand software development and software evolution, to support predictions about software development, and to exploit this knowledge concretely in planning future development. The goal of this two-day working conference is to advance the science and practice of software engineering via the analysis of data stored in software repositories.

This year, we will solicit three tracks of papers: Research, Practice, and Data. As in previous MSR editions, there will be a Mining Challenge on Stack Overflow data and a special issue of best MSR papers in the Empirical Software Engineering journal.

Research papers: Research papers can be short papers (4 pages) and full papers (10 pages). Short research papers should discuss controversial issues in the field, or describe interesting or thought provoking ideas that are not yet fully developed. Accepted short papers will present their ideas in a short lightning talk. Full research papers are expected to describe new research results, and have a higher degree of technical rigor than short papers.

Practice papers: (New!) Practice papers should report experiences of applying mining repository algorithms in an industry/open source organization context. Practice papers aim at reporting positive or negative experiences of applying known algorithms, but adapting existing algorithms or proposing new algorithms for practical use would be plus. Practice papers also can be short papers (4 pages) and full papers (10 pages).

Data papers: (New!) We want to encourage researchers to share their data. Data papers should describe data sets curated by their authors and made available to others. They are expected to be at most 4 pages long and should address the following: description of the data, including its source; methodology used to gather it; description of the schema used to store it, and any limitations and/or challenges of this data set. The data should be made available at the time of submission of the paper for review, but will be considered confidential until publication of the paper. Further details about data papers are available on the conference website.

Mining challenge: In the Mining Challenge, we invite researchers to demonstrate the usefulness of their mining tools on preselected software repositories and summarize their findings in a challenge report (4 pages). Please visit our Challenge Web Site for details about the Mining Challenge. This year, the challenge is on the Stack Overflow data. We provide the dump for the Stack Overflow web service and you should use your brain, tools, computational power, and magic to uncover interesting findings related to it.

EMSE SPECIAL ISSUE

A selection of the best research papers will be invited for consideration in a special issue of the journal, Empirical Software Engineering (EMSE), edited by Springer.

TOPICS

Papers may address issues along the general themes, including but not limited to the following:

  • Analysis of software ecosystems and mining of repositories across multiple projects
  • Models for social and development processes that occur in large software projects
  • Prediction of future software qualities via analysis of software repositories
  • Models of software project evolution based on historical repository data
  • Characterization, classification, and prediction of software defects based on analysis of software repositories
  • Techniques to model reliability and defect occurrences
  • Search-driven software development, including search techniques to assist developers in finding suitable components and code fragments for reuse, and software search engines
  • Analysis of change patterns and trends to assist in future development
  • Visualization techniques and models of mined data
  • Techniques and tools for capturing new forms of data for storage in software repositories, such as effort data, fine-grained changes, and refactoring
  • Characterization of bias in mining and guidelines to ensure quality results
  • Privacy and ethics in mining software repositories
  • Meta-models, exchange formats, and infrastructure tools to facilitate the sharing of extracted data and to encourage reuse and repeatability
  • Empirical studies on extracting data from repositories of large long-lived and/or industrial projects
  • Methods of integrating mined data from various historical sources
  • Approaches, applications, and tools for software repository mining
  • Mining software licensing and copyrights
  • Mining execution traces and logs
  • Analysis of natural language artifacts in software repositories

SUBMISSION

All papers must conform at time of submission to the ICSE/MSR 2013 Formatting Instructions and must not exceed the page limits (research/practice papers: 10 pages; short papers: 4 pages; data papers: 4 pages; challenge reports: 4 pages), including all text, references, appendices and figures. All submissions must be in English and in PDF format.

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission.

Papers must be submitted electronically through EasyChair using the following URL: http://easychair.org/conferences/?conf=msr2013

Upon notification of acceptance, all authors of accepted papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR 2013 conference. All accepted contributions will be published in the conference electronic proceedings.

Our previous post, “Empirical Evidence for the Value of Version Control“, generated a lot of comments. Many sought to explain why version control is helpful, but that’s not what we were looking for: we were looking for empirical evidence that it is. To see why we need it, take a look at this response from Jordi Cabot [1]. In it, he says:

Quite regularly, I get questions about what empirical evidence supports my “belief” that models are good… Until now, I used to point to the (true, few) scientific empirical studies on the effectiveness of software modeling…but now I have an even anser to give you: “Empirical Evidence of the Value of Version Control”.
No, I haven’t lost my mind. The point of this link is to show you that there’s no proof that version control is better for software development, and yet, I don’t think any of you would argue against it.
Same for modeling and model-driven engineering. It would be great to have more proof but the absence of proof alone should not be used against it unless you want to start also abandoning other unproven things like version control.

He’s right: if we’re willing to accept that version control is valuable, without proof, then we can hardly require advocates of modeling to prove their case. Or advocates of functional programming, or literate programming, or Hungarian notation. Heck, if we don’t require proof for our claims, then we’re honor-bound to accept that Perl is “intuitive” because its grammar has as many special cases and contradictions as the grammars of natural languages, aren’t we? Or that learning Befunge makes you a better programmer (seriously, I’ve heard that claim too).

At some point, the statement, “If we don’t need to prove the value of version control, we don’t need to prove the value of X” becomes absurd. However, everyone’s threshold of absurdity is different. I personally don’t think that modeling adds value for most developers in most situations—I think that if it did, or if its benefits really were as significant as its advocates claim, more developers would have adopted it by now—but I don’t know. What I do know is, if we can’t demonstrate the value of something that most of us believe in, like version control, what chance do we have of telling whether other practices, like modeling and test-driven development, are worth adopting (or rather, when they’re worth adopting and by whom, since I doubt there’s a one-size-fits-all answer)?

So here are my requests:

  1. Tell us what kind of study would convince you that using Befunge didn’t make programmers more productive.
  2. Then tell us what kind of study would convince you that version control didn’t either.

If your answer to the second question is is, “Nothing ever could,” then version control is an article of faith for you, and there’s no point arguing further [2]. If your answer to the second is different from your answer to the first, please tell us why.

[1] Full disclosure: Jordi and I co-authored a study of web-based software project portals. And either way, we hope you have a happy and productive 2013.

[2] This request is inspired by Karl Popper’s notion of falsifiability: a claim is only scientific if there is some way to prove it wrong.

We received this by email:

I use version control for my software, and I encourage others to do so, but I have no experimental evidence to base that decision. I pulled out my old copy of Code Complete (it’s a first edition), and the only reference it makes is to “Moore 1992″, which is a private communication that says that Microsoft considers their internal use of version control to be a competitive advantage.

The common practices I know of are:

  1. no version control
  2. every once in a while make a backup, either as a tar/zip file or copy everything into a new directory
  3. use filesystem versioning, like what was on a VAX, or Time Machine on a Mac, or Dropbox for a distributed multi-version file system
  4. - use a version control system; though this in turn can vary from SCCS and RCS to Fossil and Veracity

In addition, there’s a difference between the needs of a single developer vs. a small team, vs. a large, distributed team.

Is there published experimental evidence showing that a version control system is more useful than, say, developing using Dropbox? I tried looking for the relevant papers but I don’t know how to search that field and I couldn’t find anything.

It’s a good question—does anyone have an answer?

Jorge Aranda and I submitted a short opinion piece to Communications of the ACM in February 2012 that discussed some of the reasons people in industry and academia don’t talk to each other as much as they should. Ten months later, it has ironically turned into an illustration of one of the reasons: it was six months before we received any feedback at all, and we’ve now waited four months for any further word. In that time, Jorge has left academia and I’ve taken a job with Mozilla, so we have decided to withdraw the manuscript and publish it on my personal blog. We hope you find it interesting, and we would welcome comments.

Many people have noted the wide gulf between the people who study software development and the people who do it. One person trying to close that gap is Michael Feathers, who is running a one-day workshop in London on Wednesday, January 16 titled “Developing Project Guidance Through Code History Mining“. Feathers is the author of the landmark book Working Effectively With Legacy Code, and is actively seeking to build ties with people who have similar interests.

Our recommendation: two thumbs up.

David Ameller, Claudia Ayala, Jordi Cabot, and Xavier Franch, How do Software Architects Consider Non-functional Requirements: An Exploratory Study, RE 2012, Chicago.

Dealing with non-functional requirements (NFRs) has posed a challenge onto software engineers for many years. Over the years, many methods and techniques have been proposed to improve their elicitation, documentation, and validation. Knowing more about the state of the practice on these topics may benefit both practitioners’ and researchers’ daily work. A few empirical studies have been conducted in the past, but none under the perspective of software architects, in spite of the great influence that NFRs have on daily architects’ practices. This paper presents some of the findings of an empirical study based on 13 interviews with software architects. It addresses questions such as: who decides the NFRs, what types of NFRs matter to architects, how are NFRs documented, and how are NFRs validated. The results are contextualized with existing previous work.

In this work, Ameller et al. consider the contention that NFRs ought to be driving concerns for software architects. They conducted a study with Spanish software architects in a variety of domains to understand how they thought of NFRs. Their first finding was that no one held a formal “architect” role, although that was what their work entailed. The job position was based on skills and knowledge rather than training. Their second finding was that NFRs were not of primary importance, which contradicts other research findings. Instead, they found it was more important to consider project-wide constraints like licencing and overall cost. This suggests some interesting directions for new research in the role architecture plays in the software development process.

A related blog post with more detail can be found here.