Thứ Ba, 5 tháng 1, 2016

Secret Data Encore

My post "Secret data" on replication provoked a lot of comment and emails,  more reflection, and some additional links.

This isn't about rules

Many of my correspondents missed my main point -- I am not advocating more and tighter rules by journals! This is not about what you are "allowed to do," how to "get published" and so forth.

In fact, this extra rumination points me even more strongly to the view that rules and censorship by themselves will not work. How to make research transparent, replicable, extendable, and so forth varies by the kind of work, the kind of data, and is subject like everything else to creativity and technical improvement.  Most of all, it will not work if nobody cares; if nobody takes the kind of actions in bullet points of my last post, and it's just an issue about rules at journals. Already, (more below) rules are not that well followed.

This isn't just about "replication." 

"Replication" is much too narrow a word. Yes, many papers have not documented transparently what they actually did, so that even armed with the data it's hard to produce the same numbers. Other papers are based on secret data, the problem with which I started.

But in the end, most important results are not simply due to outright errors in data or coding. (I hope!)

The important issue is whether small changes in instruments, controls, data sample, measurement error handling, and so forth produce different results, whether results hold out of sample, or whether collecting or recoding data produces the same conclusions. "Robustness" is a better overall descriptor for the problem that many of us suspect pervades empirical economic research.

You need replicability in order to evaluate robustness -- if you get a different result than the original authors', it's essential to be able to track down how the original authors got their result. But the real issue is that much larger one.

The excellent replication wiki (many good links) quotes Daniel Hamermesh on this difference between "narrow" and "wide" replication
Narrow, or pure, replication means first checking the submitted data against the primary sources (when applicable) for consistency and accuracy. Second the tables and charts are replicated using the procedures described in the empirical article. The aim is to confirm the accuracy of published results given the data and analytical procedures that the authors write to have used. 
Replication in a wide sense is to consider the empirical finding of the original paper by using either new data from other time periods or regions, or by using new methods, e.g., other specifications. Studies with major extensions, new data or new empirical methods are often called reproductions.
But the more important robustness question is more controversial. The original authors can complain they don't like the replicator's choice of instruments, or procedures. So "replication," which sounds straightforward, quickly turns in to controversies.

Michael Clemens writes about the issue in a blog post here, noting
...Again and again, the original authors have protested that the critique of their work got different results by construction, not because anything was objectively incorrect about the original work. (See Berkeley’s Ted Miguel et al. here; Oxford’s Stefan Dercon et al. here and Princeton’s Angus Deaton here among many others. Chris Blattman at Columbia and Berk Özlerat the World Bank have weighed in on some of these controversies.)
In a good paper, published as The meaning of failed replications in the Journal of Economic Surveys he argues for an expanded vocabulary, including "verification,"  "robustness," "reanalysis" and "extension."

"Failed replication" is a damning criticism. It implies error, malfeasance, deliberately hiding data, and so forth.  What most "replication" studies really mean is "robustness," either to method or natural fishing biases, which is a more common problem (in my view). But as Michael points out, you really can't use the emotionally charged language of failed or "discrepant" replication for that situation.

This isn't about people or past work

I did not anticipate, but should have, that the secret data post would be read as criticism of people who do large-data work, proprietary-data work, or work with government agencies that cannot currently be shared.  The internet is pretty snarky, so it's worth stating explicitly that is not my intent or my view.

Quite the opposite. I am a huge fan of the pioneering work exploiting new data sets. If these pioneers had not found dramatic results and possibilities with new data, it would not matter whether we can replicate, check or extend those results.

It is only now, that the pioneers have shown the way, that we know how important the work can be, that it becomes vital to rethink how we do this kind of work going forward.

The special problems of confidential government data

The government has a lot of great data -- IRS, and census for microeconomics, SEC, CFTC, Fed, financial product safety commission in finance. And there are obvious reasons why so far it has not been easily shared.

Journal policies allow exceptions for such data. So only a fundamental demand from the rest of us for transparency can bring about changes. And has begun to do so.

In addition to the suggestions in the last post, more and more people are going through the vetting to use the data. That leaves open the possibility that a full replication machine could be stored on site, ready for a replicator with proper access to push a button. Commercial data vendors could allow similar "free" replication, controlling directly how replicators use the data.

Technological solutions are on the way too.  "Differential privacy" is an example of a technology that allows results to be replicated without compromising the privacy of the data. Leapyear.io is an example of companies selling this kind of technology. We are not alone, as there is a strong commercial demand for this kind of data. (Medical data for example.)

Other institutions: Journals, replication journals, websites,

There is some debate whether checking "replication" should count as new research, and I argued if we want replication we need to value it. The larger robustness question certainly is "new" research. Xs result does not hold out of sample, is sensitive to the precise choice of instruments and controls, and so forth, is genuine, publishable, follow-on research.

I originally opined that replications should be published by the original journal to give the best incentives. That means an AER replication "counts" as an AER publication.

But with the idea that robustness is the wider issue, I am less inclined to this view. This broader robustness or reexamination is genuine new research, and there is a continuum between replication and the normal business of examining the basic idea of a model with new data and also some new methods. Each paper on the permanent income hypothesis is not a "replication" of Friedman! We don't want to only value as "new" research that which uses novel methods -- then we become dry methodologists, not fact-oriented economists. And once a paper goes beyond pointing out simple mistakes, to questioning specification, a question which itself can be rebutted, it's beyond the responsibility of the original journal.

Ivo Welch argues that a third of each journal should be devoted to replication and critique.  The Critical Finance Review, which he edits asks for replication papers.  The Journal of Applied Econometrics has a replication section, and now invites replications of papers in many other journals. Where journals fear to tread, other institutions step in. The replication network is one interesting new resource.

Faculties

A correspondent suggests an important additional bullet point for the "what can we do" list

  • Encourage your faculty to adopt a replicability policy as part of its standards of conduct, and as part of its standards for internal and outside promotions. 

The precise wording of such standards should be fairly loose. The important thing is to send a message. Faculty are expected to make their research transparent and replicable, to provide data and programs, even when journals do not require it.  Faculty up for promotion should expect that the committee reviewing them will look to see if they are behaving reasonably. Failure will likely lead to a little chat from your department chair or dean. And the policy should state that replication and robustness work is valued.

Another correspondent wrote that he/she advises junior faculty not to post programs and data, so that they do not become a "target" for replicators. To say we disagree on this is an understatement. A clear voice on this issue is an excellent outcome of crafting a written policy.

From Michael Kiley's excellent comment below

  • Assign replication exercises to your students. Assign robustness checks to your more advanced students. Advanced undergraduate and PhD students are a natural reservoir of replicators. Seeing the nuts and bolts of how good, transparent, replicable work is done will benefit them. Seeing that not everything published is replicable or right might benefit them even more.   

Two good surveys of replications (as well as journals) 

Maren Duvendack, Richard  Palmer-Jones, and Bob Reed have an excellent survey article, "Replications in Economics: A Progress Report"
...a survey of replication policies at all 333 economics journals listed in Web of Science. Further, we analyse a collection of 162 replication studies published in peer-reviewed economics journals. 
The latter is especially good, starting at p. 175. You can see here that "replication" goes beyond just can-we-get-the-author's-numbers, and maddeningly often does not even ask that question
 a little less than two-thirds of all published replication studies attempt to exactly reproduce the original findings....A frequent reason for not attempting to exactly reproduce an original study’s findings is that a replicator attempts to confirm an original study’s findings by using a different data set
"Robustness" not "replication "
Original Results?, tells whether the replication study re-reports the original results in a way that facilitates comparison with the original study. A large portion of replication studies do not offer easy comparisons, perhaps because of limited journal space. Sometimes the lack of direct comparison is more than a minor inconvenience, as when a replication study refers to results from an original study without identifying the table or regression number from which the results come.
Replicators need to be replicable and transparent too!
Across all categories of journals and studies, 127 of 162 (78%) replication studies disconfirm a major finding from the original study. 
But rather than just the usual alarmist headline, they have a good insight. Replication studies can suffer the same significance bias as original work:
Interpretation of this number is difficult. One cannot assume that the studies treated to replication are a random sample. Also, researchers who confirm the results of original studies may face difficulty in getting their results published since they have nothing ‘new’ to report. On the other hand, journal editors are loath to offend influential researchers or editors at other journals. The Journal of Economic & Social Measurement and Econ Journal Watch have sometimes allowed replicating authors to report on their (prior) difficulties in getting disconfirming results published. Such firsthand accounts detail the reticence of some journal editors to publish disconfirming replication studies (see, e.g., Davis 2007; Jong-A-Pin and de Haan 2008, 57).
Summarizing
.. nearly 80 percent of replication studies have found major flaws in the original research
Sven Vlaeminck and Lisa-Kristin Hermmann surveyed journals and report that many journals with data policies are not enforcing them. 
The results we obtained suggest that data availability and replicable research are not among the top priorities of many of the journals surveyed. For instance, we found 10 journals (i.e. 20.4% of all journals with such policies) where not a single article was equipped with the underlying research data. But even beyond these journals, many editorial offices do not really enforce data availability: There was only a single journal (American Economic Journal: Applied Economics) which has data and code available for every article in the four issues. 
Again, this observation reinforces my point that rules will not substitute for people caring about it. (They also discuss technological aspects of replication, and the impermanence and obscurity of zip files posted on journal websites.) 

Numerical Analysis

Ken Judd wrote to me,
"Your advocacy of authors giving away their code is not the rule in numerical analysis. I point to the “market test”: the numerical analysis community has done an excellent job in advancing computational methods despite the lack of any requirement to share the code....
Would you require Tom Doan to give out the code for RATS? If not, then why do you advocate journals forcing me to freely distribute my code?...
The issue is not replication, which just means that my code gives the same answer on your computer as it does on mine. The issue is verification, which is the use of tests to verify the accuracy of the answers. That I am willing to provide."
Ken is I think reading more "rule and censorship" rather than "social norms" in my views. And I think it reinforces my preference for the latter over the former.  Among other things, rules designed for one purpose (extensive statistical analysis of large data sets) are poorly adapted to other situations (extensive numerical analysis.)

Rules can be taken to extremes.  Nobody is talking about "requiring" package customers to distribute the (proprietary) package source code. We all understand that step is not needed.

For heavy numerical analysis papers, using author-designed software that the author wants to market, the verification suggestion seems a sensible social norm to me.  If I'm refereeing a paper with a heavy numerical component, I would be happy to see the extensive verification, and happier still if I could use the program on a few test cases of my own. Seeing the source code would not be necessary or even that useful. Perhaps in extremis, if a verification failed, I would want the right to contact the author and understand why his/her code produces a different result.

Some other examples of "replication" (really robustness) controversies:

Andrew Gelman covers a replication controversy, in which Douglas Campbell and Ju Hyun Pun dissect Enrico Spolaore and Romain Wacziarg's "the Diffusion of Development" in the QJE. There is no charge that the computer programs were wrong, or that one cannot produce the published numbers. The controversy is entirely over specification, that the result is sensitive to specification and controls.

Yakov Amihud and Stoyan Stoyanov Do Staggered Boards Harm Shareholders? reexamine Alma Cohen and Charles Wang's Journal of Financial Economics paper. They come to the opposite conclusion, but could only reexamine the issue because Cohen and Wang shared their data. Again, the issues, as far as I can tell, are not a charge that programs or data are wrong.

Update: Yakov corrects me:

  1. We do not come to "the opposite conclusion". We just cannot reject the null that staggered board is harmless to firm value, using Cohen-Wang's experiment. 
  2. Our result is also obtained using the publicly-available ISS database (formerly RiskMetrics). 
  3. Why is the difference between the results? We used CRSP data and did not include a few delisted (penny) stocks that are in Cohen-Wang's sample. Our paper states which stocks were omitted and why. We are re-writing the paper now with more detailed analysis.

I think the point that replication slides in to robustness which is more important and more contentious remains clear.

Asset pricing is especially vulnerable to results that do not hold out of sample, in particular the ability to forecast returns. Campbell Harvey has a number of good papers on this topic.  Here, the issue is again not that the numbers are wrong, but that many good in-sample return-forecasting tricks stop working out of sample. To know, you have to have the data.

Thứ Hai, 28 tháng 12, 2015

Secret Data

On replication in economics. Just in time for bar-room discussions at the annual meetings.
"I have a truly marvelous demonstration of this proposition which this margin is too narrow to contain." -Fermat
"I have a truly marvelous regression result, but I can't show you the data and won't even show you the computer program that produced the result" - Typical paper in economics and finance.
The problem 

Science demands transparency. Yet much research in economics and finance uses secret data. The journals publish results and conclusions, but the data and sometimes even the programs are not available for review or inspection.  Replication, even just checking what the author(s) did given their data, is getting harder.

Quite often, when one digs in, empirical results are nowhere near as strong as the papers make them out to be.

  • Simple coding errors are not unknown. Reinhart and Rogoff are a famous example -- which only came to light because they were honest and ethical and posted their data. 
  • There are data errors. 
  • Many results are driven by one or two observations, which at least tempers the interpretation of the results. Often a simple plot of the data, not provided in the paper, reveals that fact. 
  • Standard error computation is a dark art, producing 2.11 t statistics and the requisite two or three stars suspiciously often. 
  • Small changes in sample period or specification destroy many "facts."  
  • Many regressions involve a large set of extra right hand variables, with no strong reason for inclusion or exclusion, and the fact is often quite sensitive to those choices. Just which instruments you use and how to transform variables changes results. 
  • Many large-data papers difference, difference differences, add dozens of controls and fixed effects, and so forth, throwing out most of the variation in the data in the admirable quest for cause-and-effect interpretability. Alas, that procedure can load the results up on measurement errors, or slightly different and equally plausible variations can produce very different results. 
  • There is often a lot of ambiguity in how to define variables,  which proxies to use, which data series to use, and so forth, and equally plausible variations change the results.

I have seen many examples of these problems, in papers published in top journals. Many facts that you think are facts are not facts. Yet as more and more papers use secret data, it's getting harder and harder to know.

The solution is pretty obvious: to be considered peer-reviewed "scientific" research, authors should post their programs and data. If the world cannot see your lab methods, you have an anecdote, an undocumented claim, you don't have research. An empirical paper without data and programs is like a theoretical paper without proofs.

Rules

Faced with this problem, most economists jump to rules and censorship. They want journals to impose replicability rules, and refuse to publish papers that don't meet those rules. The American Economic Review has followed this suggestion, and other journals such as the Journal of Political Economy, are following.

On reflection, that instinct is a bit of a paradox. Economists, when studying everyone else, by and large value free markets, demand as well as supply, emergent order, the marketplace of ideas, competition, entry, and so on, not tight rules and censorship. Yet in running our own affairs, the inner dirigiste quickly wins out. In my time at faculty meetings, were few problems that many colleagues did not want to address by writing more rules.

And with another moment's reflection (much more below), you can see that the rule-and-censorship approach simply won't work.  There isn't a set of rules we can write that assures replicability and transparency, without the rest of us having to do any work. And rule-based censorship invites its own type I errors.

Replicability is a squishy concept -- just like every other aspect of evaluating scholarly work. Why do we think we need referees, editors, recommendation letters, subcommittees, and so forth to evaluate method, novelty, statistical procedure, and importance, but replicability and transparency can be relegated to a set of mechanical rules?

Demand

So, rather than try to restrict supply and impose censorship, let's work on demand.  If you think that replicability matters, what can you do about it? A lot:
  • When a journal with a data policy asks you to referee a paper, check the data and program file. Part of your job is to see that this works correctly. 
  • When you are asked to referee a paper, and data and programs are not provided, see if data and programs are on authors' websites. If not, ask for the data and programs. If refused, refuse to referee the paper. You cannot properly peer-review empirical work without seeing the data and methods. 
  • I don't think it's necessary for referees to actually do the replication for most papers, any more than we have to verify arithmetic. Nor, in my view, do we have to dot is and cross t's on the journal's policy, any more than we pay attention to their current list of referee instructions. Our job is to evaluate whether we think the authors have done an adequate and reasonable job,  as standards are evolving, of making the data and programs available and documented. Run a regression or two to let them know you're looking, and to verify that their posted data actually works. Unless of course you smell a rat, in which case, dig in and find the rat. 
  • Do not cite unreplicable articles. If editors and referees ask you to cite such papers, write back "these papers are based on secret data, so should not be cited." If editors insist, cite the paper as "On request of the editor, I note that Smith and Jones (2016) claim x. However, since they do not make programs / data available, that claim is not replicable."  
  • When asked to write a promotion or tenure letter, check the author's website or journal websites of the important papers for programs and data. Point out secret data, and say such papers cannot be considered peer-reviewed for the purposes of promotion. (Do this the day you get the request for the letter. You might prompt some fast disclosures!)  
  • If asked to discuss a paper at a conference, look for programs and data on authors' websites. If not available, ask for the data and programs. If they are not provided, refuse. If they are, make at least one slide in which you replicate a result, and offer one opinion about its robustness. By example, let's make replication routinely accepted. 
  • A general point: Authors often do not want to post data and programs for unpublished papers, which can be reasonable. However, such programs and data can be made available to referees, discussants, letter writers, and so forth, in confidence. 
  • If organizing a conference, do not include papers that do not post data and programs. If you feel that's too harsh, at least require that authors post data and programs for published papers and make programs and data available to discussants at your conference. 
  • When discussing candidates for your institution to hire, insist that such candidates disclose their data and programs. Don't hire secret data artists. Or at least make a fuss about it. 
  • If asked to serve on a committee that awards best paper prizes, association presidencies, directorships, fellowships or other positions and honors, or when asked to vote on those, check the authors' websites or journal websites. No data, no vote. The same goes for annual AEA and AFA elections. Do the candidates disclose their data and programs? 
  • Obviously, lead by example. Put your data and programs on your website. 
  • Value replication. One reason we have so little replication is that there is so little reward for doing it. So, if you think replication is important, value it. If you edit a journal, publish replication studies, positive and negative. (Especially if your journal has a replication policy!) When you evaluate candidates, write tenure letters, and so forth, value replication studies, positive and negative. If you run conferences, include a replication session. 
In all this, you're not just looking for some mess on some website, put together to satisfy the letter of a journal's policy. You're evaluating whether the job the authors have done of documenting their procedures and data rises to the standards of what you'd call replicable science, within reason, just like every other part of your evaluation.

Though this issue has bothered me a long time, I have not started doing all the above. I will start now.

Here, some economists I have talked to jump to suggesting a call to coordinated action. That is not my view

I think this sort of thing can and should emerge gradually, as a social norm. If a few of us start doing this sort of thing, others might notice. They think "that's a good idea," and start doing it too. They also may feel empowered to start doing it. The first person to do it will seem like a bit of a jerk. But after you read three or four tenure letters that say "this seems like fine research, but without programs and data we won't really know," you'll feel better about writing that yourself. Like "would you mind putting out that cigarette."

Also, the issues are hard, and I'm not sure exactly what is the right policy.  Good social norms will evolve over time to reflect the costs and benefits of transparency in all the different kinds of work we do.

If we all start doing this, journals won't need to enforce  long rules. Data disclosure will become as natural and self-enforced part of writing a paper as is proving your theorems.

Conversely, if nobody feels like doing the above, then maybe replication isn't such a problem at all, and journals are mistaken in adding policies.

Rules won't work without demand

Journals are treading lightly, and rightly so.

Journals are competitive too. If the JPE refuses a paper because the author won't disclose data, and the QJE publishes it, the paper goes on to great acclaim, wins its author the Clark medal and the Nobel Prize, then the JPE falls in stature and the QJE rises. New journals will spring up with more lax policies. Journals themselves are a curious relic of the print age. If readers value empirical work based on secret data, academics will just post their papers on websites, working paper series, ssrn, repec, blogs, and so forth.

So if there is no demand, why restrict supply? If people are not taking the above steps on their own -- and by and large they are not -- why should journals try to shove it down authors' throats?

Replication is not an issue about which we really can write rules. It is an issue -- like all the others involving evaluation of scientific work -- for which norms have to evolve over time and users must apply some judgement.

Perfect, permanent replicability is impossible. If replication is done with programs that access someone else's database, those databases change and access routines change. Within a year, if the programs run at all, they give different numbers. New versions of software give different results. The best you can do is to  freeze the data you actually use, hosted on a virtual machine that uses the same operating system, software version, and so on. Even that does not last forever. And no journal asks for it.

Replication is a small part of a larger problem, data collection itself.  Much data these days is collected by hand, or scraped by computer. We cannot and should not ask for a webcam or keystroke log of how data was collected, or hand-categorized. Documenting this step so it can be redone is vital, but it will always be a fuzzy process.

In response to "post your data," authors respond that they aren't allowed to do so, and journal rules allow that response. You have only to post your programs, and then a would-be replicator must arrange for access to the underlying data.  No surprise, very little replication that requires such extensive effort is occurring.

And rules will never be enough.

Regulation invites just-within-the-boundaries games. Provide the programs, but no poor documentation.  Provide the data with no headers. Don't write down what the procedures are. You can follow the letter and not the spirit of rules.

Demand invites serious effort towards transparency. I post programs and data. Judging by emails when I make a mistake, these get looked at maybe once every 5 years. The incentive to do a really good job is not very strong right now.

Poor documentation is already a big problem. My modal referee comment these days is "the authors did not write down what they did, so I can't evaluate it." Even without posting programs and data, the authors simply don't write down the steps they took to produce the numbers. The demand for such documentation has to come from readers, referees, citers, and admirers, and posting the code is only a small part of that transparency.

A hopeful thought: Currently, one way we address these problems is by endless referee requests for alternative procedures and robustness checks.  Perhaps these can be answered in the future by "the data and code are online, run them yourself if you're worried!"

I'm not arguing against rules, such as the AER has put in. I just think that they will not make a dent in the issue until we economists show by our actions some interest in the issue.

Proprietary data, commercial data, government data. 

Many data sources explicitly prohibit public disclosure of the data. Disclosing such secret data remains beyond the current journal policies, or policies that anyone imagines asking journals to impose. Journals can require that you post code, but then a replicator has to arrange for access to the data. That can be very expensive, or require a coauthor who works at the government agency. No surprise, such replication doesn't happen very often.

However, this is mostly not an insoluble problem, as there is almost never a fundamental reason why the data needed for verification and robustness analysis cannot be disclosed. Rules and censorship is not strong enough to change things. Widespread demand for transparency might well be.

To substantiate much research, and check its robustness to small variations in statistical method,  you do not need full access to the underlying data. An extract is enough, and usually the nature of that extract makes it useless for other purposes.

The extract needed to verify one paper is usually useless for writing other papers. The terms for using posted data could be, you cannot use this data to publish new original work, only for verification and comment on the posted paper.  Abiding by this restriction is a lot easier to police than the current replication policies.

Even if the slice of data needed to check a paper's results cannot be public, it can be provided to referees or discussants, after signing a stack of non-use and non-disclosure agreements. (That is a less-than-optimal outcome of course, since in the end real verification won't happen unless people can publish verification papers.)

Academic papers take 3 to 5 years or more for publication. A 3 to 5 year old slice of data is useless for most purposes, especially the commercial ones that worry data providers.

Commercial and proprietary (banks) data sets are designed for paying customers who want up-to-the-minute data. Even CRSP data, a month old, is not much used commercially, because traders need up to the minute data useful for trading.  Hedge fund and mutual fund data is used and paid for by people researching the histories of potential investments. Two-year old data is useless to them -- so much so that getting the providers to keep old slices of data to overcome survivor bias is a headache.

In sum, the 3-5 year old, redacted, minimalist small slice of data needed to substantiate the empirical work in an academic paper are in fact seldom a substantial threat to the commercial, proprietary, or genuine privacy interest of the data collectors.

The problem is fundamentally about contracting costs. We are in most cases secondary or incidental users of data, not primary customers. Data providers' legal departments don't want to deal with the effort of writing contracts that allow disclosure of data that is 99% useless but might conceivably be of value or cause them trouble.  Both private and government agency lawyers naturally adopt a CYA attitude by just saying no. 

But that can change.  If academics can't get a paper conferenced, refereed, read and cited with secret data,  if they can't get tenure, citations, or a job on that basis, the academics will push harder.  Our funding centers and agencies (NSF)  will allocate resources to hire some lawyers. Government agencies respond to political pressure.  If their data collection cannot be used in peer-reviewed research, that's one less justification for their budget. If Congress hears loudly from angry researchers who want their data, there is a force for change. But so long as you can write famous research without pushing, the apparently immovable rock does not move. 

The contrary argument is that if we impose these costs on researchers, then less research will be done, and valuable insights will not benefit society. But here you have to decide whether research based on secret data is really research at all. My premise is that, really, it is not, so the social value of even apparently novel and important claims based on secret data is not that large. 

Clearly, nothing of this sort will happen if journals try to write rules, in a profession in which nobody is taking the above steps to demand replicability. Only if there is a strong, pervasive, professional demand for transparency and replicability will things change.

Author's interest 

Authors often want to preserve their use of data until they've fully mined it. If they put in all the effort to produce the data, they want first crack at the results.

This valid concern does not mean that they cannot create redacted slices of data needed to substantiate a given paper. They can also let referees and discussants access such slices, with the above strict non-disclosure and agreement not to use the data.

In fact, it is usually in authors' interest to make data available sooner rather than later. Everyone who uses your data is a citation. There are far more cases of authors who gained notoriety and long citation counts from making data public early then there are of authors who jealously guarded data so they would get credit for the magic regression that would appear 5 or more years after data collection.

Yet this property right is up to the data collector to decide. Our job is to say "that's nice, but we won't really believe you until you make the data public, at least the data I need to see how you ran this regression." If you want to wait 5 years to mine all the data before making it public, then you might not get the glory of "publishing" the preliminary results. That's again why voluntary pressure will work, and rules from above will not work.

Service

One  empiricist who I talked to about these issues does not want to make programs public, because he doesn't want to deal with the consequent wave of emails from people asking him to explain bits of code, or claiming to have found errors in 20-year old programs.

Fair enough. But this is another reason why a loose code of ethics is better than a set of rules for journals.

You should make a best faith effort to document code and data when the paper is published. You are not required to answer every email from every confused graduate student for eternity after that point. Critiques and replication studies can be refereed in the usual way, and must rise to the usual standards of documentation and plausibility.

Why replication matters for economics 

Economics is unusual. In most experimental sciences, once you collect the data, the fact is there or not. If it's in doubt, collect more data. Economics features large and sophisticated statistical analysis of non-experimental data. Collecting more data is often not an option, and not really the crux of the problem anyway. You have to sort through the given data in a hundred or more different ways to understand that a cause and effect result is really robust. Individual authors can do some of that -- and referees tend to demand exhausting extra checks. But there really is no substitute for the social process by which many different authors, with different priors, play with the data and methods.

Economics is also unusual, in that the practice of redoing old experiments over and over, common in science, is rare in economics. When Ben Franklin stored lighting in a condenser, hundreds of other people went out to try it too, some discovering that it wasn't the safest thing in the world. They did not just read about it and take it as truth. A big part of a physics education is to rerun classic experiments in the lab. Yet it is rare for anyone to redo -- and question -- classic empirical work in economics, even as a student.

Of course everything comes down to costs. If a result is important enough, you can go get the data, program everything up again, and see if it's true.  Even then, the question comes, if you can't get x's number, why not?  It's really hard to answer that question without x's programs and data. But the whole thing is a whole lot less expensive and time consuming, and thus a whole lot more likely to happen, if you can use the author's programs and data.

Where we are 

The American Economic Review has a strong data and programs disclosure policy. The JPE adopted the AER data policy. A good John Taylor blog post on replication and the history of the AER policy. The QJE has decided not to; I asked an editor about it and heard very sensible reasons. Here is a very good review article on data policies at journals by By Sven Vlaeminck

The AEA is running a survey about its journals, and asks some replication questions. If you're an AEA member, you got it. Answer it. I added to mine, "if you care so much about replication, you should show you value it by routinely publishing replication articles."

How is it working? The Report on the American Economic Review Data Availability Compliance Project
All authors submitted something to the data archive. Roughly 80 percent of the submissions satisfied the spirit of the AER’s data availability policy, which is to make replication and robustness studies possible independently of the author(s). The replicated results generally agreed with the published results. There remains, however, room for improvement both in terms of compliance with the policy and the quality of the materials that authors submit
However, Andrew Chang and Phillip Li disagree, in the nicely titled "Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say `Usually Not'"
We attempt to replicate 67 papers published in 13 well-regarded economics journals using author-provided replication files that include both data and code. ... Aside from 6 papers that use confidential data, we obtain data and code replication files for 29 of 35 papers (83%) that are required to provide such files as a condition of publication, compared to 11 of 26 papers (42%) that are not required to provide data and code replication files. We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the 6 papers that use confidential data and the 2 papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable. 
I read this as confirmation that replicability must come from a widespread social norm, demand, not journal policies.

The quest for rules and censorship reflects a world-view that once we get procedures in place, then everything published in a journal will be correct. Of course, once stated, you know how silly that is. Most of what gets published is wrong. Journals are for communication. They should be invitations to replication, not carved in stone truths.  Yes, peer-review sorts out a lot of complete garbage, but the balance of type 1 and type 2 errors will remain.

A few touchstones:

Mitch Petersen tallied up all papers in the top finance journals for 2001–2004. Out of 207 panel data papers, 42% made no correction at all for cross-sectional correlation of the errors.  This is a fundamental error, that typically cuts standard errors by as much as a factor of 5 or more. If firm i had an unusually good year, it's pretty likely firm j had a good year as well. Clearly, the empirical refereeing process is far from perfect, despite the endless rounds of revisions they typically ask for. (Nowadays the magic wand "cluster" is waved over the issue. Whether it's being done right is a ripe topic for a similar investigation.)

"Why Most Published Research Findings are False"  by John Ioannidis. Medicine, but relevant

A link on the  controversy on replicability in psychology

There will be a workshop on replication and transparency in economic research following the ASSA meetings in San Francisco

I anticipate an interesting exchange in the comments. I especially more links to and summaries of existing writing on the subject

Update On the need for a replication journal by Christian Zimmermann
There is very little replication of research in economics, particularly compared with other sciences. This paper argues that there is a dire need for studies that replicate research, that their scarcity is due to poor or negative rewards for replicators, and that this could be improved with a journal that exclusively publishes replication studies. I then discuss how such a journal could be organized, in particular in the face of some negative rewards some replication studies may elicit.
But why is that better than a dedicated "replication" section of the AER, especially if the AEA wants to encourage replication? I didn't see an answer, though it may be a second best proposal given that the AER isn't doing it.

Update 2

A second blog post on this topic, Secret Data Encore

Thứ Bảy, 3 tháng 1, 2015

A Citizen's Guide to Sustainable Urban City Planning For 2030

Mary Vincent - Shenzhen China
In 2007, I ran for City Council in my city on a Sustainable Economic Development Platform while working a technology job in Silicon Valley, California.

At the time it was a visionary platform but very doable in my opinion. The city wanted to build housing and a business on land that will be covered by water due to rising sea levels in San Francisco by 2050. The city was not looking at the long term strategy incorporating both environment and economic needs. Now, many cities have implemented long-term sustainable economic principles for their cities while involving businesses and citizens in these planning decisions.

A sustainable city has a long-term plan in place for citizens to live, work, learn, and play. Transit Oriented Development is an important concept that I campaigned on. Urban planning, includes sustainable development in water, waste, energy, food production and transportation using fewer resources, creating thriving communities, promoting a healthy lifestyle, inspiring new ideas and driving economic growth. I spoke about these topics at a Google conference in San Francisco, a Technology conference in Silicon Valley, Technology Groups in the UK, and at a Net Impact Conference on Organics and Business. I created Green Star Solution, a consulting firm to help create and enable these innovations and I've shared updates on various digital platforms including my Smart Tech News blog and Twitter http://www.twitter.com/MaryVincent .

I became a Stakeholder Advisory Group Member to the World Resources Institute Greenhouse Gas Protocol, Department of Energy Literacy Education Group Stakeholder, Advisor on Technology and Environment Global Health Research Foundation, and Mentor for the Stanford Engineering for Good and Technology Entrepreneurship classes. Also while living and working near Budapest Hungary, I established an Earth Week in my city and worked with business owners to add environmental solutions into their business operations.

The City of San Francisco has a great Environmental Plan here for reference that can serve as a model for other cities http://www.sf-planning.org/ftp/general_plan/I6_Environmental_Protect ion.htm.
ICLEI also has a great platform and community for sustainable city models.

The Masdar Engage Blogging Contest is an innovative concept to help folks share their ideas and I hope to travel to Abu Dhabi to participate in the conference.



Chủ Nhật, 9 tháng 11, 2014

Must Read: Rewilding Our Hearts by Marc Bekoff

Marc Bekoff, professor emeritus of ecology and evolutionary biology at the University of Colorado, Boulder, has written a new inspirational and constructive book called Rewilding our Hearts, Building Pathways of Compassion and Coexistence.

"We live in a wounded world that is in dire need of healing," as he makes an impassioned call to reverse unprecendented global losses of biodiversity and habitat by changing ourselves. Rewilding means "to make wild again" and it is frequently used in wildlife conservation to refer to re-creating wildlife habitat and creating corridors between preserved land for wildlife to travel through, thus allowing declining populations to rebound. Bekoff applies the Rewilding concept to human psychology and attitudes. We need to rewild both ourselves and other nature, Bekoff claims. He details the growing, global compassionate conservation movement and gives action oriented advice to individuals, city planners, governments, and business leaders.
I highly recommend you read this book and share with friends, business colleagues, political leaders, and on digital media. Let's all work to help rewild our hearts and make sure all of our decisions incorporate all species.
Purchase on Amazon at http://www.amazon.com/Rewilding-Our-Hearts-Compassion-Coexistence/dp/1577319540

Thứ Sáu, 7 tháng 11, 2014

REGISTER: November 8 - 9 Chesapeake Climate Data Hackathon


http://www.chesapeakeconservancy.org/chesapeake-hackathon
Chesapeake Conservancy, Intel, Old Dominion University, Esri Host Weekend “Coding for Good” Event

WHO:    Old Dominion University Students
             Experienced Coders
             Representatives from the Chesapeake Conservancy, Intel and Esri

WHAT:   Climate Data Initiative: Chesapeake Bay Hackathon

WHEN:   November 8-9, 2014, 9:00 a.m. to 9:00 p.m. each day

WHERE: Old Dominion University’s Virginia Beach Center 1881 University Drive, Virginia Beach, VA 23453

WHY:   The weekend event will bring together some of the areas brightest minds in the high-tech industry, including students and professionals, to brainstorm and develop data-driven solutions that will help government officials better  understand and track flooding problems caused by climate change and sea level rise in the Hampton Roads region. Coders and students will create apps using many different data sets (forecasts, real time environmental and road closure data, public facing maps, etc.). These new tools will provide planners and managers with the information they need to make informed decisions and the public with the tools they need to understand when and where flooding will affect their daily lives.

REGISTER: http://www.chesapeakeconservancy.org/chesapeake-hackathon