Classification

Blog posts tagged: Semantics

See SharePoint blog post on SharePoint 2010 Taxonomy Limits for introduction to taxonomy

Classifying content based on purpose and re-use, can usually divide into two categories:

  • Formal: contractual, required for legal purposes, specific uses, should have a defined lifecycle
  • Informal: knowledge retention, assists decisions, may or may not be reused, unclear and lifecycle but value usually degrades over time

Case Studies

BT Intranet: Distinguishes between 5 different types of content

  • Formal: Authoritative – reliable and kept up to date
  • Team: owned by a group of people, for a target audience
  • Crowd: community owned information, anyone can contribute
  • Personal: individual content
  • Services: online processes

Source: Dorthe Jesperson, J.Boye, April 2010

Snippets

Embracing the chaos of data – by Audrey Watters, O’Reilly Radar, Jan 2012

Interview with data scientist and former Apple engineer Peter Warden (@petewarden):

Structured data is always better than unstructured, when you can get it. The trouble is that you can’t get it. Most structured data is the result of years of effort, so it is only available with a lot of strings, either financial or through usage restrictions.

The first advantage of unstructured data is that it’s widely available because the producers don’t see much value in it. The second advantage is that because there’s no “structuring” work required, there’s usually a lot more of it, so you get much broader coverage. Dealing with unstructured data puts the burden on the consuming application instead of the publisher of the information, so it’s harder to get started, but the potential rewards are much greater

Link to presentation outline (includes link to presentation PDF) – Embrace the chaos

Metadata vs Data: a wholly artificial distinction – Terry Jones, Fluid Info, Sep 2009

Computer scientists are fond of talking about metadata. There often seems to be an assumption that drawing a distinction between metadata and data is useful and perhaps even necessary. At an architectural level, I think that’s entirely wrong. Any storage architecture that maintains a distinction between metadata and data has real problems that will limit its flexibility and usefulness. <- is where schema-less DBs are focused (Note to self: metadata in SP makes sense for doc libraries, not for lists)

Semantic Search: The Myth and Reality – Alex Iskoid, ReadWriteWeb, May 2008

Semantic search is an upcoming technology that has set the expectations way too high (description of the different types of search and different types of search engine tackling them)

Sergey Brin speaks with UC Berkeley class – Google video, 2006

Semantics and tagging are great as long as computers are doing it [not people]

Other links

© Copyright 2005 - 2012 Joining Dots Ltd. All rights reserved.

All product names, logos, brands and other trademarks referred to within this site are the property of their respective trademark holders.
Content published here is provided 'as is' for information purposes only with no warranties or guarantees regarding its accuracy.