Controlled Vocabulary Challenge

The article by Fred Leise was very insightul – pointing out the power of a controlled vocabulary  to help a user effectively query a database.  The explanation of the need for authority files and hierarchical relationships (broader term and narrower term) was also quite clear.  However, I question the assertion that the intended meaning of even simple terms can be understood by almost everyone.

The example is given in the article of The Gap (clothing store) with the indication that the controlled vocabulary  (along with the accompanying pictures) make the meaning perfectly clear.  As one reader responded in the comments of the article, there is plenty of room for misinterpretation – even by speakers of the same language.  The Gap uses the term “bottoms and pants.” In the United Kingdom, pants refer specifically to underwear.

As I discovered one evening in London, even a simple request for “cream” for my tea had negative results when Housekeeping arrived with several small containers of “clotted cream.” This product is great for spreading on scones – but of no use to a cup of tea.


What is a controlled vocabulary?

Controlled vocabulary is described as a subset of natural language.  it is not how we speak, but instead a translation of how we speak, in order to understand the actual organization of words.  When initially reading the definition of a controlled vocabulary, I first associated it with the “spell-check” mechanism of Google.  Even when a user is performing a Google search and using incorrect spelling, the search engine will still auto-correct and generate the intended search.  By having the controlled vocabulary, Google is able to ensure that the user finds the intended results, even with errors like misspellings.  I find the idea of a controlled vocabulary particularly interesting when it comes to some other examples that the article listed, such as for synonyms.  I imagine that individuals from different geographical areas who have different dialects would each use unique terms for the same words.  In a normal situation, those words would generate different responses, but with a controlled vocabulary both users are able to receive the same results.


The more I read about the Dewey Decimal Catalog system, the more I am in awe of Melvil Dewey, a genius and a man for all times. I did not realize that he was obsessed with decimals and had organized a committee to study the feasibility of introducing the metric system.  Imagine how much easier our lives would be today, had he succeeded!

This entertaining and “tongue in cheek” article discusses the many problems with the DDC and why it it not possible to “FIX” it. In the event that the Library of Congress does manage to “fix” it

“What would happen next?

“Tens of thousands of librarians around the world pick up their razor blades and scrape the white numbers off the spines of millions of books muttering under their breath about those damn editors who don’t understand that every little change means that librarians inhale toxic white dust….”

In addition to angry librarians, there would be riots all over the world because:

 “The Sunnis and the Shiites are upset because they have been put at the same level.”

“East Somewhere is furious because it doesn’t recognize West Somewhere as a legitimate country.”

To make matters worse, “Librarians are out buying razor blades in bulk and white ink by the gallon.”

The reason “the Dewey Decimal Classification system cannot be fixed is  because knowledge is unfixed!

However, Amazon is neither complaining nor rioting. The company has discovered how to “fix” the system and is laughing all the way to the bank!





Otlet (Forgotten Forefather, Origins of Info Science)

How interesting to read about Paul Otlet and his ideas, creations and visions with respect to information classification and retrieval (among other related topics).  Its amazing that someone would undertake such a comprehensive task of creating a “master bibliography” of the entire world’s books and documents!  And for him to envision it as a faceted system so that topical relationships are interconnected, something that we seem to still be discussing and perfecting today.  Its quite fascinating that someone can propose concepts that are not fully appreciated or comprehended for nearly 100 years.  

I’d love to know more details about how Otlet’s work (or what was left of it) was more or less abandoned for fifty years at at University?!?  How does that happen?  It was the 1940s, and while I suppose the war going on in Europe likely had something to do with it at the time, what about afterwards?  No one bothered to clean out that room until the 1990s?  I thought space in Europe was at more of a premium than that, especially at a University!  It seems to say a lot about Otlet’s fall from recognition for his contributions, although as is the case with many big thinkers and creators, the value of his ideas seems to be more appreciated now than during his life.  

Katie B.

Paul Otlet

“UDC’s most innovative and influential feature is its ability to express not just simple subjects but relations between subjects … In UDC, the universe of information (all recorded knowledge) is treated as a coherent system, built of related parts, in contrast to a specialised classification, in which related subjects are treated as subsidiary even though in their own right they may be of major importance.”

Expressing the relationships between subjects and ideas in a “web” is the entire dream of hypertext and hypermedia. The “links” between information under UDC are relational. Organizational structures for information, such as the Dewey Decimal or anything organized in a hierarchy of distinct subjects, moves from being introduced as a general subject and then becomes more specific in a top-down direction through the node. Relational information structures found in UDC or hypertexts, information can be linked across numerous subject “nodes” and users can access information in a nonlinear way. UDC implies that no documents have self-evident, eternal subjects and meanings, but their aboutness is always being defined by new associations and amalgamations. Even subject matter from a long time ago is constantly being redefined by the present, so it seems that faceted organization is more significant than ever.

Learning about Paul Otlet’s contribution to information architecture, I especially loved hearing about his installation of index cards in a sprawling array of cabinets. This sounds bizarre and beautiful to me; Otlet literally was beginning to build a visual/physical analogue of the Internet at the beginning of the 20th century. 

Weinberger – The Geography of Knowledge

I thought this chapter was great because it touched on a lot of what we’ve gone over in class with radical cataloging, the general outdated feel of the Dewey Decimal System, and how Amazon pretty much has the coolest classification systems ever.  I liked that Weinberger pointed out that overhauling the system is much easier said than done, and it is very much a double-edged sword. It would be great if Dewey Decimal became more inclusive of other cultures and easier to understand and categorized for the contemporary age, but then you have to deal with the hundreds of thousands of libraries that now have to overhaul where their physical books go and how they are arranged.

When I worked at a library in high school I was there when they decided to give graphic novels/comic anthologies their own section separate from young adult books (which are placed in their own area in my library), and even though that was just one small section to rearrange we complained about it for weeks. I can’t imagine having to re-shelve half the library because the Dewey Decimals changed.

It’s hard to find a happy medium in a situation like this, but I think it is possible to find a solution that’s a bit better than “oh, we all know this is outdated, but it’s what we’ve got to work with.” Weinberger points out that information and knowledge are ever-changing and evolving, so it might just actually be impossible to ever have a truly Amazon-esque cataloging system for a library. I think a possibly solution may be for each library to individually consider its own users and what they’re looking for, but that would also unleash a whole other pile of problems. (A lack of a universal system might mean you’re out of luck if you go to a library outside your own neighborhood, etc.)

Otlet: Forgotten Forefather

I was very surprised to read about Paul Otlet, his work, and his accurate vision for how information would be organized in the future.  So many of his ideas laid the foundation for what became Information Science, the internet, the faceted search, and linked data.  It is astonishing that Otlet’s vision, which marries “the determinism of facets with the relativism of social networks”, continues to unfold.  Dewey and Ranganathan are regarded as “founding fathers” of Information Science.  The readings rightfully recognize Otlet as having just as great an impact.

Forgotten Forefather: Paul Otlet

I enjoyed reading Alex Wrights article, Forgotten Forefather: Paul Otlet. The fact that Otlet coined the term “links” to describe the relationship between related pieces of information really drove home for me the importance of his web of human knowledge and his innovative thinking. I also found it interesting that Wright brought up Borges’ Library of Babel in relation to Otlet’s Universal Book. As I was reading the article on Otelt the Library of Babel also came into my mind. This story paints the idea of having a “book” that contains all of the information known to people as so impossible that it drives men mad. However, Otlet had the foresight to envision a kind of database that serves the same purpose. Although I suppose the world wide web does not contain all of human knowledge, it come pretty close. The problem we have today is in organizing this information.

Everything is Miscellaneous

Senior Researcher, David Weinberger, explains that there are “three orders” of order and he describes them as:

  1. 1st Order – The organization of physical objects (e.g. books on a shelf)
  2. 2nd Order – Extraction of the metadata about the object (e.g. card catalogue with various sort orders – author, title, and subject)
  3. 3rd Order – Data and Metadata coexist in the digital environment

In light of the third order, Weinberger explains why he believes that the Dewey Decimal Classification (DDC) system is flawed. Under the DDC system, there is the limit of 10 numbers and 10 subcategories within those numbers. He explains that someone (or group) must exercise authority when it comes to deciding how a resource is classified. Furthermore, a resource can exist in only one physical location.

Enter the 3rd order where data and metadata are digital and thus can be used to easily locate each other. For example, one can type in the phrase “the fault is not in our stars” and discover that the author is William Shakespeare, the play is Julius Caesar, and that the play is downloadable for free from,, – and many more sites.

Weinberger underscores three points:

  1. In the digital world, an object can be placed in as many categories as desired. (He uses Amazon as an example where cameras can be placed in electronics, photography, cameras, hobbies, etc.)
  2. Messiness is a virtue since this ability to categorize more freely and to link to other categories actually makes it easier for a user to find things. We are asked to think of faceted classification, tagging, and folksonomies that give the reader/searcher more power and autonomy. Weinberger says that “we own the organization of resources.”
  3. Metadata can be used to locate data – and vice versa.

Weinberger posits that at this moment, there is no difference between data and metadata.