Visionary Paul Otlet’s even forecasted today’s search functionality, calling it “consultation.” According to Rayward, Otlet wanted to “liberate” content from just the metadata of bibliographic files—to bring forth “what was of value or use in the content of documents by dissection or decomposition” for analysis and synthesis. This seems a prediction of text data mining: full-text searching to discover patterns and trends and to reveal new ideas across texts.

Here’s a visual of the Mundaneum from The New York Times:


Heidorn writes that “scholars are unaware of the coming changes in the sociology of science.” So in addition to “retraining” themselves to curate, preserve, and provide access to data, librarians — as key stakeholders in the information ecology — will need to educate students and researchers about data management. This seems a tall order. But maybe the best approach is to see open data as an opportunity to prove the library’s relevance.

On knowing “relatively little about current data management practices of scholars,” the publisher Wiley this week released an infographic that summarizes survey results about researchers’ perspectives on data-sharing practices, attitudes, and motivations.


The top two reasons researchers don’t share: concerns about intellectual property and no mandate to do so. But the culture is changing. MegaJournal PLOS as well as other publishers now require authors to deposit data underlying research from their papers in a public repository. And funders want to see the impact of their dollars. DataCite, which assigns DOIs to datasets, and Thomson Reuters with their new Data Citation Index will enable them to track influence.

Akin to the efficiency of copy cataloging, where can we find and download controlled vocabulary lists by discipline? There are many sites. Here’s one for photographers who use Adobe Lightroom; another with translated terminology put together by the International Monetary Fund; and a list for astronomy librarians.

Leise, Fast, and Steckel’s “What is a Controlled Vocabulary” is an easy-to-understand overview. The list of benefits for organizations — knowledge management, website content management systems, and internal communications — is apt. CVs are semantic layers “between the term entered by the user” and “the underlying database to better represent the original intention of the term.” With artificial intelligence and language algorithms that can make sense of “patterns,” however, might we be able to program computers to translate the “raw, rich, goey glory” of “natural language” without controlled vocabularies?

Thomas Mann makes a compelling point about the importance of LC subject headings and the value of the librarian cataloger. He writes that “a system that enables people simply to recognize what they cannot specify beforehand is crucial” because, without this capability, “researchers will routinely settle for whatever comes up…even if their misguided specification of terms causes them to miss the best material.” His assessment of why librarians are susceptible to unwarranted criticism — the prospect of being labeled old-fashioned and misguided feedback in a new web environment — is valid.

But if subject headings optimize search, why is Google and its keyword search so popular? A qualitative study suggests that students favor simplicity and ease and are willing to put up with irrelevant results.

Maybe there is value in both types of search. Subjects headings are useful to display different context on the same topic. Keywords are fundamental to the web environment and are great for quick searches.

Classification and cataloging reflect the worldview and sensibility of a particular time. In “Teaching the Radical Catalog,” Emily Drabinski says classification systems are “socially produced and embedded…products of human labor that carry traces of all the intentional and unintentional racism, sexism, and classism of the workers who create them.” Is there value then to retain a record of updates within the LC catalog — to capture these changes in society?

Drabinski proposes that library programs “teach students to engage critically with the classifications as text, encouraging critical thought in relation to the tools.” In this way, users would “understand the limits of and power enacted by classifications,” in order to “use them for their concrete purposes — finding books on library shelves — and to transform our relationships to them via critical engagement.” This seems in line with a new pedagogy in education which is based on students (previously passive learners) working in partnership with teachers to create new knowledge.

During last week’s in-class Omeka work, I suggested that we create categories and identify content to fill each one. My teammates opted instead to find a large selection of relevant items and then classify. Indexing vs Facets? According to Steckel, “Rather than creating a slot to insert the object into, one starts with the object and then collects and arranges all the relevant pieces on the fly. This allows for greater flexibility and a high degree of specificity.” Are there generational differences in how we order and retrieve information and subsequently think and learn?

PBS’ new mini-series How We Got to Now categorizes the history of modern life and innovation around six themes: Clean, Time, Glass, Light, Cold, and Sound. This is a good example of a classification scheme that organizes information and generates new ideas, per John Dewey’s “Knowledge is classification.”

And, I find Ranganathan’s Five Laws to be pertinent for many areas of digital scholarship: website architecture as discussed by Steckel but also content management strategies and academic publishing.

  1. “Information” is for use. Web content, journal articles, e-books available online whenever and wherever.
  2. Every user, his information. Portals and content curated and classified for different audiences; user-centric.
  3. Every information, its user. Content integrated and interoperable across sites and resources.
  4. Save the time of the user. Content catalogued with rich metadata on digital systems, distributed by proxies.
  5. The information collection is a growing organism. An ever-changing universe of content and creators responsive to digital disruptions — from open science initiatives to the internet of things.

I was taken by Thomas Baker’s description of RDF as supporting the creation of knowledge and that RDF data “speaks for itself.”

But this is possible only if metadata is preserved and its availability on the web is “reliably accessible over time and that its URIs will not be sold, re-purposed, or simply forgotten. ”Long-term preservation, Baker notes, will require distributed approaches and “arrangements of mutual support and cooperation among vocabulary maintainers.” Is FOAF a practical model? And how would we form a “coalition of memory institutions”?

Preservation concerns aside, calling the web a knowledge information repository makes sense. Here’s more information about Google’s “Knowledge Graph”: Check out the intro video. Google calls itself a “knowledge engine” — not an information engine.

Final Project Proposal

Christina Tse

Non-western cataloging or classification systems

Anthropologists have documented the ways in which culture affects the categorization process. For example, the Tzeltal people in Chiapas, Mexico differentiate butterfly larvae — but not butterflies — because larvae are an important food source. How is culture reflected in non-western cataloging systems? And what are the limitations of classification systems developed by western standards for non-western libraries?

Within the topic of “non-western cataloging or classification systems,” I will be looking at the development of China’s library system, its relationship to the civil service examinations, and how the cultural structure imposed by these important institutions helped to unify the country.

Preliminary resources:

Kracke, E.A.. “Family vs. Merit in Chinese Civil Service Examinations Under the Empire.” Harvard Journal of Asiatic Studies 10, no. 2 (1947).

Kuang, Neng-fu. “Chinese Library Science in the Twelfth Century.” Libraries & Culture 26, no. 2 (1991).

Xie, Zhuo Hua. “Libraries and the Development of Culture in China.” Libraries & Culture 31, no.  (1996).

This week’s readings by Bradford Lee Eden and Karen Coyle/Diane Hillman reminded me of another: “Blow up the corporate library” by Thomas Davenport and Laurence Prusak written more than 25 years ago. This article pressed corporate librarians to demonstrate value, to stop treating the library as a bunker, and to bring the library into the mainstream of the business.

Where I work, I’ve seen how, to most users, it does not matter that “libraries currently are the only conduits for a wealth of published literature that is not available for open access on the public Internet” and also that “users will engage with services that provide materials quickly with least effort.” Under threat of closure, our library earlier this year reviewed users’ needs, interviewed management, and developed a vision statement and goals. New services, such as tools for information access and research guidance on publishing needs, are planned. We haven’t yet gotten to an “application of business models to workflows,” but projects with different “communities of practice” are underway.

In an ah-ha moment, I realized this week that metadata helps us make sense of the world. By connecting disparate information (interoperability), it can show relationships and trends to generate insights.

Funders can use metadata on journal articles — unique identifiers for articles, authors, and data sets, for example — to track the scholarly research generated by their investment and perhaps tie it to societal impacts. Companies can bring together information from spreadsheets, systems, and databases in different departments to make strategic decisions. Even individuals can use metadata to define (through categorizations of daily activities) and map their lives.

Anne J. Gilliland in “Setting the Stage” writes that metadata “provides us with the Rosetta stone that will make it possible to decode information objects and their transformation into knowledge…” I wholeheartedly agree.