In an April 12, 2010 blog post, Archivist of the United States David S. Ferriero proclaimed the dream of creating an army of “citizen archivists.” His plan? He wants to tap into the “collaborative power of the internet” to bring more and more American citizens into the work of the National Archives – safeguarding the records of our government through preservation and description. Essentially, he wants to crowdsource the United States Archives.
What is Crowdsourcing?
A portmantou built from “crowd” and “outsourcing,” crowdsourcing is the use of a community of users (the crowd) to accomplish tasks typically performed by employees (outsourcing). This model has flowered over the past few years as a go-to method for developing new technologies and designs, and for processing large amounts of data. Both nonprofit and for-profit corporations have used it, and even the British government has called on “the crowd” on a number of occasions.
However, contrary to what Ferriero’s post suggests, crowdsourcing is not an entirely new concept for the Archives – at least in the form of volunteerism. Starting in the 1970s, NARA began building what it calls “armies of volunteers,” many of whom are already processing archival collections alongside professional archivists. These “armies” include people of all ages and backgrounds, from retired men and women to students earning school credit for internships. Key to the program is the fact that everyone goes through a background check as well as “comprehensive training” and a commitment of two years’ work – mandates which would have to be modified for online crowdsourcing.
What does Crowdsourcing Mean for the Archive ?
There are a number of potential benefits in opening the archive to the online crowd. Chief among them is the fact that, right now, there just aren’t enough archivists and volunteers to get the job done. The output of the United States government is already massive, and is growing with every passing decade. Enlistment of online citizen archivists means that more eyes could touch these documents (even if digitally), potentially helping the National Archives to catch up with the backlog.
Another major benefit is the potential for greater community engagement with the archives. Ferriero is much more engaged with the public than any of his predecessors, utilizing his blog to share his thoughts, current reads, and his own calendar of meetings and events. He encourages comment on the blog as well as on the official National Archives website. By tapping into citizen interests, the National Archives could not just increase access, but also encourage citizen input on a variety of levels.
Yet, for every benefit crowdsourcing may bring, there is also a potential problem. Mathew Ingram summarized this:
…it’s usually cheaper than professionally produced content….And many users care more deeply about the content they generate themselves than they do about the stuff that comes from the pros, which means deeper levels of engagement. The problem…however, is the loss of control it involves.
This statement was intended to describe general user-generated content, but the same words could just as easily apply to crowdsourcing. However, whereas in the commercial realm loss of control can simply lead to embarrassment and, at worst, lawsuits, in the archive it can mean loss of precious artifacts and documents, and in the National Archives it can mean loss of our cultural heritage, not to mention national security.
Perhaps the biggest concern (and most significant barrier) for crowdsourcing the archives will be security. Many of the crowdsourcing efforts involve anonymity and little instituted training, but that goes against the National Archives’ approach to volunteerism and even daily function. In order to gain access to National Archives records, everyone – archivists, volunteers, and researchers included – must show identification and sign in. This security could be lost in crowdsourcing.
The fact is, the existing crowdsourcing models do little to help the National Archives’ plight, which may help to explain the lack of actual planning in Ferriero’s post.
Could NARA Follow Existing Crowdsourcing Models?
NASA’s Approach
The main area in which crowdsourcing can be utilized is the digitization and transcription phase of archiving. These are some of the most time consuming projects in archives that decide to undertake them, while requiring the least number of skills. Data processing of this kind is most common in the scientific realm of crowdsourcing, and Ferriero found his inspiration in NASA’s “Be a Martian” project.
There are a number of similarities between the archive’s potential crowdsourcing interests and NASA’s project beyond the fact that both are government-run. Both the National Archives and NASA have similar impetuses for crowdsourcing in the first place: too much data and not enough paid employees to go through it. In addition, all have a pretty good incentive: users can look at things they wouldn’t really look at otherwise. Whether the face of Mars, or government documents, one could say that it’s all pretty “alien” to the average crowd, and thus gains fascination through its mystery.
Yet looking at pictures of Mars is very different from transcribing a government document, especially when those documents may be tens or even hundreds of pages long. Transcription of government documents would require more dedication from users, who would need to do more than just skim and add cursory tags. Also, while government documents may seem interesting to users at first glance, the documents that the crowd would really want to analyze are likely to be the more important ones that would either be entrusted to an archivist or classified (so that foreign governments couldn’t see them along with the crowd).
“Digitize First, Catalog Later”
The big new thing in archives has certainly been digitization of records, and the National Archives has digitized quite a bit considering its overload issues. From a YouTube channel to photos on Flickr, the National Archives is trying to get as much online as possible to increase worldwide access and decrease stress on the records themselves. Unfortunately, digitization is time consuming and, by proxy, often too expensive for the bulk of records, and even now most digitization is done by unpaid volunteers.
Archivist Greg Colati suggests that for born-digital documents, it may actually be better to “digitize first [and] catalog later,” based on crowdsourced tagging. This is because for anything that is born digital, the cost to put it online is minimal.
Unfortunately for the National Archives, this model holds little promise. For the sake of authenticity and security, most government documents continue to be maintained in paper record copies rather than digital form. Even if the National Archives could just dump documents online, there would be major complications. One must be wary of uploading documents willynilly, particularly given that once something is online it never goes away. Thus, an archivist would still have to go through the documents to ensure their viability for the public space before being able to put them online, which may defeat much of the benefit in Colati’s approach.
Following a Policy of Cautious Optimism
The National Archives does have to open up, but crowdsourcing based on other models may not be the best option. It is important to remember that, for every “citizen archivist” researcher that Ferriero describes, there may also be someone such as Thomas Lowry, who is currently under investigation after admitting to altering the date on a Lincoln pardon document. The forgery was discovered by an archivist. There is something to be said for the trusted eyes of a trained professional or volunteer, and that training and trust is something that is often hard to build up in crowdsourcing operations. Even with background checks and training, sometimes volunteers and researchers can break that trust.
Thus, unlike Ferriero, I am much more cautious towards crowdsourcing and I believe that to embrace it with open arms while still without a comprehensive plan could be destructive to the National Archives in ways that Ferriero may not see. Ultimately, the archives (of all kinds) will have to approach crowdsourcing in the same way they approached digitization a few years ago: with caution, optimism, attention to the archives’ uniqueness and, most of all, with innovation.
This reminds me a lot of our discussion of social tagging in our LIS 530 class, which brings up an argument about the value of crowdsourcing that I find compelling: the nature of that which is being analyzed, and the purpose of analyzing it, determines how applicable crowdsourcing techniques would be.
It would seem to me that the National Archives is a difficult arena for crowdsourcing to be valuable because of the high esteem in which authenticity and curation are held for archiving in particular. Even if the privacy of the ‘crowd’ is not at stake, the value of the collection certainly may be. While Google Image Labeler may be looking for popular and obvious, rather than accurate, qualitative descriptors, this is not always the case.
One might even make the argument that qualitative description is more likely to infringe on a contributor’s privacy – my political beliefs are more likely to reveal themselves in describing a government document than in counting craters. However, if Ferriero can find some way to quantify the tasks done, I think his dream has a much greater chance of becoming reality.
Crowdsourcing definitely seems to be a popular movement: it also looks like Darpa (the Defense Advanced Research Projects Agency ) has recently put out a call for citizen recruits. They have recently started a project for “Unconventional Warfighters”- that is, those of us normal people who aren’t military trained, but have skills to contribute. Like the other crowdsourcing programs, there’s a definite appeal in providing an outlet for those who want to contribute to a cause, and harnessing that enthusiasm and creativity (and in this case, patriotism). I can imagine possible issues around classified information, but it’s an interesting idea in any case.
… Though I’m not sure I’d be first in line to volunteer my family dog to secret missions with the military.
I agree that crowdsourcing should be a cautious undertaking, but extremely necessary in this digital era. Many citizens visit the National Archives and Records Administration (NARA) locations and website not to learn about historical era documents, but to find personal genealogy records and to research family history within specific cities and communities. As such, military records, immigration records, citizenship records among other pertinent personal records, can be spread throughout the country in various locations of NARA (not to mention state and local archives), which can making tracing family history difficult and expensive to research. Third parties and private companies such as Ancestry.com and Footnote.com have set the standard for identifying data within records and usability, so much so, that NARA encourages using these websites for family research before attempting research using their microfilm and website.
Crowdsourcing can be a legitimate solution for non-vital handwritten records created prior to 1900 such as ship rolls, immigration and naturalization rolls, and land records. Many records are already digitized but without any reliable way to search these records or even discerning the data within the records. Crowdsourcing can help supplement records already transcribed by trained volunteers to increase validity of the original transcription and add important tags to groups of documents that can help identify record groups and series from the prospective of the public citizen user and researcher. With more reliable transcriptions and tags or identifiers for the documents, families can trace their genealogical roots from a distance faster, easier, and more reliably with the online resources already available.
The information in the above post came from my own personal experience as a NARA volunteer and as a volunteer at both the city archive and a regional branch of the state archive.
Buildings are not very cheap and not everybody is able to buy it. However, mortgage loans are invented to support people in such kind of cases.
Family history and ancestors programs like family tree genius software are specialized laptop or computer programs. They deal with far more than expected spaces of a person’s everyday living. They are able to handle a person’s source citations, their free- form information, photographs and most significantly additional types of multi-media, though their primary intention would be to collect data. This work has to become carried out with treatment because the subsequent step of action with the genealogy program depends upon the accuracy with the information. They have a tendency to record facts about your location and date of birth, your location of matrimony or if accomplished on account of an ended person, his’ or her’ place and date of dying. Additionally they record an individual’s marriages from father and mother to partners to children. Then the work from the computer software would be to organize and put out this kind of ancestry and genealogy information. He encourages comment on the blog as well as on the official National Archives website. This security could be lost in crowdsourcing. The forgery was discovered by an archivist. Third parties and private companies such as Ancestry. Many records are already digitized but without any reliable way to search these records or even discerning the data within the records. Crowdsourcing can help supplement records already transcribed by trained volunteers to increase validity of the original transcription and add important tags to groups of documents that can help identify record groups and series from the prospective of the public citizen user and researcher.
Genealogy plans like family tree genius are specialized pc tools. They will handle more compared to anticipated aspects of the person’s everyday life. They can manage a person’s supply details, their free- type notes, pictures and most importantly alternative forms of multimedia systems, although their primary goal would be to collect data. This employment has to be carried out with treatment as the subsequent step of activity of the genealogy system depends upon the precision of the information. They often document information about your place and date of birth and labor, your location of matrimony or if performed on account of an terminated individual, his’ or her’ place and date of your demise. In addition they record an individual’s relationships from mothers and fathers to couples to youngsters. Then the work from the software is to organize and put out these kinds of genealogy information.
Family tree programs like family tree software are customized pc programs. These products deal with alot more compared to anticipated areas of the person’s daily life. They are able to handle a person’s source details, their free- form information, photographs and most of all additional types of multi-media, although their primary aim is to gather information. This occupation has to be carried out with care since the subsequent step of action from the genealogy system depends upon the reliability with the data. They usually document facts about your location and date associated with birth, your place of marriage or if performed on account of an terminated individual, his’ or her’ location and date of demise. They also record an individual’s human relationships from father and mother to couples to children. Then the function with the software would be to set up and put out these types of ancestry and genealogy information. He encourages comment on the blog as well as on the official National Archives website. This security could be lost in crowdsourcing. The forgery was discovered by an archivist. Third parties and private companies such as Ancestry. Many records are already digitized but without any reliable way to search these records or even discerning the data within the records. Crowdsourcing can help supplement records already transcribed by trained volunteers to increase validity of the original transcription and add important tags to groups of documents that can help identify record groups and series from the prospective of the public citizen user and researcher. This work has to become carried out with treatment because the subsequent step of action with the genealogy program depends upon the accuracy with the information. Then the work from the computer software would be to organize and put out this kind of ancestry and genealogy information. This employment has to be carried out with treatment as the subsequent step of activity of the genealogy system depends upon the precision of the information.