Bias in Tagging and Indexing Schemes

Dear Julia et al. Some feedback on your article…

Wikipedia, sociology, and the promise and pitfalls of Big Data by Julia Adams and Hannah Bruckner

Have you considered the issue of gender bias in tagging (keywords and metatags)?

There was an interesting study done by the Pew Foundation a few years ago.

Tagging by Lee Rainie

I think you’d see implicit gender bias in the vernacular of categorizations in any crowd-sourced product. The overt top-down bias due to constitution of the editorship (predominantly male) addressed in your article may be only one manifestation of the issue. I’d wager it’s happening from the bottom-up too!

I wonder if the reverse was true back in the “Wild West” pre-OCLC/Internet days of manual indexing and abstracting—which, FYI, was done primarily by women.

I picked up an MLS back in the early 1980s (in the interest of cataloging my “boring book” collection) and couldn’t help noticing I was 1 of 3 men out of 600+ women in the program. I was the only male in most courses such as Children’s Literature (aka “kitty lit”). That was the norm for schools of library science back then and I wonder if the gender balance has changed.

There were so few of us embarking on a library career, in fact, that affirmative action played in our favor. We were referred to as “future library directors”–the implication being that by virtue of harboring a Y-chromosome–we’d be fast-tracked into leadership roles. The inverse consequence is that women ended up tracked into tactical roles, like indexing and abstracting, affirming my belief that a feminine perspective far more pervades in the categorizations of the corpus than you’d think.

Thus, might your Wikipedia findings be an instance of confirmation bias?

Here’s a further consideration; though there are established cataloging rule sets, such as AACR2 (, the application of those rules was/is rather capricious, and certainly vulnerable to other frames of reference such as race, religion, etc. AACR2 doesn’t apply strictly to print resources, BTW. I remember having to catalog various non-print artifacts—stuff like globes and marble statues in library holdings.

Not surprisingly, when you move away from textual material assessed by subject matter experts (most libraries won’t hire you without a 2nd content-specific masters), the variation in categorizations and descriptors increases markedly. It’s like two people arguing over the merits of a painting–which gets funky fast–and there’s no accounting for taste.

The lack of inter-rater reliability was bad enough that any two librarians–take one in LA and another in NYC–would invariably come up with bizarrely different categorizations. I hadn’t considered demographic issues like gender or race as bias factors in indexing schemes back in the ’80s.

I was so taken aback by this that I considered dropping out of the program. This was instigated from feedback that I–the lone male in a course entitled Acquisitions and Organization–received from a professor–who happened to be a female–on my classification of an object d’art. While we disagreed on our respective interpretations of the “rules”–and I maintain to this day I was right–the fact is that authority rules the day.

But even as a 20-something, it was obvious that people just interpret rule sets differently. The problem, as Kripke has pointed out, is due to semantics, not syntax. In modal logic one can construct “worlds of interpretation” that render logical expressions “valid”–even if the world is something out of Alice in Wonderland.

This is also the crucial error in Shank’s AI approach to story “understanding.” Computers can be programmed to mimic human interpretive behavior based on pre-programmed schemas, but the schemas themselves reflect the biases of their human creators. To wit, there’s that old joke about the Russian natural language understanding program that took the biblical expression, “The spirit is strong but the flesh is weak” and recast it as “The vodka is good but the meat is tough.”

Luckily, with the advent of library automation, even if there’s still an issue of interpretation and (little doubt) undercurrent of bias, at least there’s been some standardization with the authority lists promulgated by Deweyites and LCites. There you have it—resolution by fiat in a white male dominated world!

I’m CCing David Weinberger on this, because he would likely find your article of interest. Curiously, he’s cited in the Pew piece, which I hadn’t noticed until taking a 2nd look at it (after several years) prior to emailing you. A scream!

A further thought on this. You’re aware of the Wiki Wars… They’re still going on in the community areas. Even if there’s bias on the front-end, I’d be curious as to whether women are even in the fight on the back-end.

If not—*you* should start the movement! To borrow from Sarah Palin, if, and no doubt many men are pigs, put lipstick on them.

My wife is a clinical researcher at Pfizer, and she recounts a phenomenon there that is probably manifest in libraries, publishing houses, and mainstream media. Men may rule the roost, but women do most of the work.

Thus, I’d really be interested to see if your claim that the biases of the male dominated hierarchy in these worlds are as impactful on categorizations and curations of the new media. I’ve seen incidental anecdotal evidence that women are predominant at the intersection of men and machines. Perhaps they are providing a filter layer between the mainstream content generators and the zeitgeist of 30-something nerdy males that are commonly identified as the norm in your piece. Was not your own Grace Hopper to first to point out a bug?

Speaking of which–I’m glad you renamed your college in honor of her. I’d been lobbying Yale to reference her since attending a talk she gave at Mory’s 30 years ago. More recently, I was pushing for this with naming of the new college.

See top-level entry at

Thankfully, they finally did right by her!

Will you be touching upon this in your talk at the Wilton Library?


Comments are closed.