I’ll be interested to find out whether there’s a distinct difference in data management conventions for LIS professionals as opposed to whoever is creating the data. The blog post about the scarf and the overview on the Penn State library website both seemed to be addressing people who create data and, more generally, assuming an expert understanding of the data’s significance. As a result, they seemed to place a significant fraction of the responsibility for facilitating access to the data on researchers. If researchers are expected to have the skills to create their own metadata and maintain their data, does that mean that the role of librarians in a scientific setting is more advisory than custodial? In the scarf analogy, the person who publishes knitting patterns may not have created the actual pieces or even be able to, and the person who makes the scarf may not know the instruction-manual industry conventions and best practices for representing physical actions in a language any other knitter can understand. At the same time, the author of the manual has to know enough about the process of knitting to be able to foresee what knitters need in a pattern. So how much do scientific librarians need to know about the research that produced a certain dataset? What exactly are they doing with the data that researchers aren’t expected to either do themselves or direct in minute detail?
Heidorn’s article on data curation and E-science was an enjoyable and informative read, but I was left with a distressed feeling that academic libraries are ill-equipped to take on the task of providing long term management of the mountains of data coming from the scientists, scholars and affiliated institutions. I actually shuttered when I read:
Instrumentation and computerization enable scholars and civil servants to collect data with volumes equal to the text content of the entire Library of Congress in a matter of days (Baraniuk, 2011).
How can underfunded and overworked libraries possibly keep up with this massive accumulation of digital material? I was glad to hear that the NIH and NSF are requiring data management plans when they are doling out grants, but I hardly think that is enough oversight as society is expecting to see not just published results, but raw data that will have to be not only stored, but checked and migrated constantly. It seems like scholars would need an endowment in place to preserve their work, but it is more likely for the burden of preservation to fall on the lap of the LIS community. Plus, getting taxpayers to chip in for saving a 1983 clinical study on string cheese consumption is going to be difficult to say the least.
For perspective, I found another interesting blog post from two years ago that also charted various organizations that create “a Library of Congress” amount of data. [http://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/] Not surprising to see NASA and Facebook on that list.
With the advancements in commercial cloud servers, our hopes may lie in the private sector and academic libraries must strive to work with these 3rd part vendors or risk distancing themselves from the role of collecting and sharing the intellectual output of society. We are drowning in data and it may be up to the tech section to throw the libraries, scholars and general public a lifeline.
I loved Sarah Calaghan’s blog post “How is a scarf like a dataset?” I thought the analogy was quirky, but true, and it boiled the complex idea of describing, organizing, and managing data into some easy-to-understand concepts. For me, the most important piece of insight from her post was the aside:
(As an aside, I didn’t keep all the metadata about how I made the scarf and what yarn I used for it written down somewhere, which meant that when I came to write this post, I needed to work it out all over again. In other words, metadata should be collected from the start and stored somewhere safe, regardless of what it’s describing!)
What a valuable piece of information about data management! “Metadata should be collected from the start and stored somewhere safe, regardless of what it’s describing!” I feel like printing that out and hanging it on the wall of my office. I know I’ve mentioned my project before, but I’m currently trying to reorganize a massive amount of digital files, and the lack of metadata for the majority of this data has made its management a near-impossible task.
I thought that the Penn State guidelines for data management were incredibly helpful in providing the necessary steps to properly organizing a collection of data. I thought all of the elements listed in the data planning example chart were so important to the complete process of data management. I really hope to use these guidelines in my own project while I consider metadata creation, metadata extraction, data registries, long-term file storage, and all the other steps involved!
I found the article The Emerging Role of Libraries in Data Curation and E-Science extremely interesting–in discussing the role of the library and changing needs of scientific data collection in the digital age. The article broke down what needs to be done, and again just like the Artist Books we talked about last week, that it would need to be a collaborative effort, and as a result of a shift from “data poor to data rich” in research the need for data curation.
What I thought was funny and reread several times was the instances where the author seemed to get very dramatic, especially in sentences like “when academic library administrators first hear that scholarly data now fall within the purview of the library, they may lose many nights’ sleep wondering who has cast the curse upon them…” or “Mornings, over coffee with public and school library friends, academic librarians may lament their fate.” While I understand that librarians are able to look at a situation like data curation and recognize without knowing the details what a huge undertaking it will be–isn’t that what we want. It seems like in other professions that technology is making their jobs more obsolete (a movie projectionist comes to mind–I worked in a movie theater for a long time). This is a perfect example of how and why librarians will always be needed. A lot of the articles we read about this semester talked about rigid certain things are, or how conservative, or not keeping up with the times things are–records, Library of Congress headings, Dewey, cataloging. It also seems at the same time we do not like things that are clearly defined or new. I think that these new areas where our librarian knowledge can be utilized is a good thing. As mentioned in the article that public and school librarians “may be relieved that this task of curating…is not their fate,” those are the jobs that always seemed to get cut first in economic unstable times.
(when I saw the title “How Is a Scarf Like a Dataset?” I was really hoping it would connect the process of knitting to this topic, which is an accessible analogy for me. I wasn’t disappointed!) In any case, while this article was wholly entertaining, I do not feel that I learned a lot about the creation of datasets, and am now really curious about what the “bind off” of datasets is. I did find it to be generally applicable to how I USED datasets in other classes. It was a fun article, though.
I also found the Heidorn article very interesting. As someone who works with government documents, and documents produced by government entitities (which do not always seem to be the same thing – the judicial branch never seems to be categorized with govdocs), I am so interested in how proprietary databases and products making these documents and data accessible work.
“Many scholars are unaware of the coming changes in the sociology of science and do not have the required skill sets to address the requirements in their new proposals (Cragin, Palmer, Carlson, & Witt, 2010). Worse, librarians know relatively little about current data management practices of scholars. Institutions have not yet established who will conduct data curation work.”
This is precisely why this kind of data ends up only functionally accessible by something like ProQuest. Equating scholars to government agencies and entitites is perhaps ignoring nuance, particularly that government data is publicly funded and that which is not classified should be free and easily accessable, but training librarians in specialized data management practices can only make information more accessible to the public.
In direct contrast to my last comment, I think it is so interesting that data collection (as in, making data part of the collection) by academic libraries is even happening. It is something that had never occurred to me. The idea that libraries should be involved from the start of data collection is so intriguing, and I see where the idea is coming from, but is that standard feasible for scientists who are not working under an academic umbrella? I am sure that many scientific organizations do have librarians, but, my point is, could a rise in librarian-aided research lead to preference for that data, and therefore research from smaller organizations may become even less represented? Is data collection with the help of a librarian better data, or just easier to integrate into a library?
The web page on Data security intrigued me immensely, since many current events lately have been as a result of this. It encouraged librarians to back up their data as well as verify the safety of the cite itself. It reminded me of a quote from the movie Sex Tape where the main male character says “No one knows how the Cloud works!” As a result, his private materials was able viewable between several family members and friends. This also recently occurred to Jennifer Lawrence, an actress in the Hunger Games. Not to suggest we as librarians are dealing with certain kinds of materials, but we should be aware of these possible happenings. Such as, odds are a few of us in the program have a desire to become an Archivist. As archivists, we should become familiar with the ins and outs of the software we are using. Therefore, this will aid us in becoming more familiar with technologies and not releasing our database information before we are ready.
The concept of a ‘data curator’ really stood out in the Bryan Heidorn article as a dynamic expression. As material is now digital, mass information output makes preservation a layered task and as the Bryan Heidorn article points out, “libraries are among the only institutions with the capacity to curate many data types.” Having technical skills and tools helps with only one step, determining information worth storing is a major component.
Even so, we are still in early stages of understanding how digital media can be archived to prevent future readability in an age of rapid technological development. Still, Heidorn believes libraries have “organizational culture” necessary to approach and harness this project.
This article influenced me to find out more about the UK Digital Curation Centre, “founded to solve problems in digital curation”, and a few other sites mentioned. I recently studied the MIT Laboratory for Social Machines, which considers the impact of online social platforms. Projects of the sort will help with an aspect of data curation, as we begin to understand the impact of new media.
It is quite mind-blowing just how much data humans are now capable of producing and, as access to the technology to create more data grows, the amount will only increase. The case for Librarians to take on the role of Data curation is elegantly made in P. Bryan Heidorn’s The Emerging Role of Libraries in Data Curation and E-Science paper.
Curation of the data is within libraries’ mission, and libraries are among the only institutions with the capacity to curate many data types. The data are critical to the scientific and economic development of society.
As most of us are already all too aware, we live in an age where the need to upgrade, back-up, sort and store our own digital materials is ongoing and seems never ending – be it photographs, emails, (or readings for courses), the need to curate personal digital materials: organize, preserve or perish is the new mantra in a bid to archive and still have access to our memory prompts. And so too with data that is research based and possibly holds the key to a scientific breakthrough either on it’s own (requiring discoverability) or if linked to another piece of data. The benefit of data sharing, as argued for on the: What is Data Management? page of the Penn State University Libraries site is an aspect that particularly appeals given my group’s project on linked open data and the many benefits to society of living in an open data world.
Making Connections! Worlds starting to collide!
This week’s reading on Data Management and the organizing of data in research projects was very helpful for me. The reading on Codebooks- was harder to grasp for me, I was left more with the how and when I would be interfacing with this type of material, then the actual understanding of the process. But interesting nonetheless. I found that I could apply both the readings on Data Management and Citying Bytes to my work and world as a designer and that for me is exciting, to finally be able to start to be making connections between my two worlds.
What is Data Management :
The article clearly set forth the process of controlling data used in research projects – a daunting and extreamely important task to be sure. From the Research Data – to Data Sharing , I found the creation of an effective data management plan seems to be key. The tab labeled “data planning” showed the easy to follow descriptons of elements needed to create this plan.
Citing Bytes – Adventures in Data Citation: How is a scarf like a dataset? By Sarah Callaghan
“The yarn in a ball doesn’t contain any information or structure, but by the act of putting stitches into it, you’re encoding something… My scarf was created by a process of appending- each new row got added to the previous, like a dataset where each new measurement gets appended on to the previous one to make a time series”( Callaghan)
I appreciated the simple,clear language of this piece. The analogy of knitting and data sets is helpful. I found myself thinking about my work as a designer, and the production books I have to keep-all that is metadata! You don’t need any of this information to see the costumes in the show, but it is important to keep it if you want to study or recreate the costumes! Which people do!. I too have to keep metadata from the start of a project and and store it somewhere safe, going back is nearly impossible or worse-time consuming drudgery. I began to think about everything I create- and the meta data that goes with it let alone the research and the planning and organizing of every project i take on.
It is now not only libraries and librarians role to find and help users navigate data but also in their mission to preserve and curate this data. Heidorn’s “The Emerging Role of Librarians in Data Curation and E-science” discusses the role of libraries within (mostly) academic institutions that are now being charged with the extensive task of data curation and management. I thought Heidorn was unclear about what he was trying to express with his article and his ideas seemed to jump from place to place. I also thought he could have been clearer in explaining the processes and terminology he used. I don’t think data curation was actually explicitly defined until halfway through the article. I finally learned that data curation is the active process of maintaining, preserving, and adding value to digital research data throughout its lifecycle.
Heidorn’s brief mention of “negative findings” and unpublished data reports as well as his related question, “how will this data be managed and what is the role of libraries in this profound shift in the organization of data?” extremely interesting. It is essential that unpublished data is preserved and remains accessible so that it can be reused, recreated, and manipulated. Grey Literature is an example of these unpublished reports and data that is free from the politics and monetary incentives of commercially published works. This is especially useful in health sciences fields where users can find very recent results of datasets, clinical trial data, working papers, etc. that have both positive and negative results. Access to original data is extremely important in order to further academic pursuits. Now librarians are taking on the role to do just that.