Tuesday, February 10, 2009

WSDM, Day 1

Today was the first day of the WSDM conference -- in beautiful Barcelona, Spain. I can't say I've seen much of Barcelona given that I arrived this morning and was in the conference for 12 hours straight. Maybe by Thursday I'll have information on Barcelona. For now, though, I'm thorougly absorbed by what I saw at the conference today.

The early morning was devoted to Google (I had no idea that their method for storing information about Web documents was so similar to the raw MARC record that is used in library systems, tags and all) thanks to the keynote by Jeffrey Dean from Google in a talk that actually made the news. A real treat.

The rest of the morning was devoted to Web retrieval problems, and many seemed like problems that library and information science have been working with for a while. The "Query by Document" (QBD) paper was interested in cross-references on a document-level, and proposed using wikipedia for expansion. A paper on personalization/"group"ization was concerned by the query, the relevance of documents, and the user profile. Again, I'm reminded these aren't concerns exclusive to libraries, and I'm glad that such a gifted community is putting resources into exploring these questions.

No one talked about "ICTs," but one paper mentioned UGC (user generated content). I have a feeling that I speak a related language, but not the same as ACM folks here. For example, in the afternoon, we had two talks on classification, but both used the term, as far as I could tell, interchangeably with categorization. To me, as a cataloger, categorization and classification are NOT one in the same, and I admit to missing out on the finer points of the talks while getting caught up in the classing/categorizing details.

The afternoon papers talking about social tagging mentioned nothing of the socialness of tags... folksonomies were only mentioned in passing at the end of the second of the two papers on the topic, and the affective nature of tags was also only mentioned once. Again, this doesn't matter to retrieval folks, but to library and info. science folks (including knowledge organizagtion folks), it's a big deal.

One of the most interesting papers began by discussing the possibility of using Wikipedia to drive users to books. It turns out that the authors quickly focused instead on getting to Wikipedia articles from books (which seems much more straightforward), with more work in the future for getting to specific info in books. From some of my discussions at lunch, it appears that apparently folks modelling news are also interesting in identflying entities, in a way that doesn't seem dissimilar to the FRBR model. But that the way of getting to the book, especially using the surrogate (instead of the OCR for a full-text scan) remains wildly difficult.

More tomorrow.

No comments:

The opinions expressed in this blog are uniquely my own; they in no way reflect the position of the U.S. Dept. of State or the Fulbright Commission.