Blog posts tagged: Semantics
See SharePoint blog post on SharePoint 2010 Taxonomy Limits for introduction to taxonomy
Classifying content based on purpose and re-use, can usually divide into two categories:
- Formal: contractual, required for legal purposes, specific uses, should have a defined lifecycle
- Informal: knowledge retention, assists decisions, may or may not be reused, unclear and lifecycle but value usually degrades over time
Case Studies
BT Intranet: Distinguishes between 5 different types of content
- Formal: Authoritative – reliable and kept up to date
- Team: owned by a group of people, for a target audience
- Crowd: community owned information, anyone can contribute
- Personal: individual content
- Services: online processes
Source: Dorthe Jesperson, J.Boye, April 2010
Snippets
Embracing the chaos of data – by Audrey Watters, O’Reilly Radar, Jan 2012
Interview with data scientist and former Apple engineer Peter Warden (@petewarden):
Structured data is always better than unstructured, when you can get it. The trouble is that you can’t get it. Most structured data is the result of years of effort, so it is only available with a lot of strings, either financial or through usage restrictions.
The first advantage of unstructured data is that it’s widely available because the producers don’t see much value in it. The second advantage is that because there’s no “structuring” work required, there’s usually a lot more of it, so you get much broader coverage. Dealing with unstructured data puts the burden on the consuming application instead of the publisher of the information, so it’s harder to get started, but the potential rewards are much greater
Link to presentation outline (includes link to presentation PDF) – Embrace the chaos
Metadata vs Data: a wholly artificial distinction – Terry Jones, Fluid Info, Sep 2009
Computer scientists are fond of talking about metadata. There often seems to be an assumption that drawing a distinction between metadata and data is useful and perhaps even necessary. At an architectural level, I think that’s entirely wrong. Any storage architecture that maintains a distinction between metadata and data has real problems that will limit its flexibility and usefulness. <- is where schema-less DBs are focused (Note to self: metadata in SP makes sense for doc libraries, not for lists)
Semantic Search: The Myth and Reality – Alex Iskoid, ReadWriteWeb, May 2008
Semantic search is an upcoming technology that has set the expectations way too high (description of the different types of search and different types of search engine tackling them)
Sergey Brin speaks with UC Berkeley class – Google video, 2006
Semantics and tagging are great as long as computers are doing it [not people]
Other links
- SharePoint 2010 Taxonomy Limits – SharePoint blog post, Nov 2011
- How to build a naive Bayes classifier – bionicspirit.com, Feb 2012
- Discovery Open Metadata Principles – a metadata ecology for UK education and research




