Thinking about the Semantic Web
Just a quick note until I have time for a longer one (which is a piece I promised to have to Kendall Clark for publication in XML.com by November, but it was November 2004, so I despair a bit on getting it out).
I recently was approached about a workshop called “The Semantics in the Semantic Web” which wants to look a little less at the formal issues of semantics qua. DL and the like, but to explore how ontologies and the like on the Web can help people doing information retrieval and similar. There’s also an emphasis, Frank van Harmelen has used it in somethings he has done and it will be a focus at a couple of forthcoming Semantic Web meetings, and I have used the phrase often (such as in my comment to this Slashdot entry) but I’m starting to think that neither of these slogans really gets at what I think is really the important thing in the Semantic Web.
Basically, to me the real power of the Semantic Web is, as the name implies, the Web of Semantics. I think the real power of the Semantic Web happens when vocabularies (like those that can be expressed in SKOS) and ontologies (esp. those modeled in OWL and its future enhancements w/rules, etc.) start to refer to terms in other vocabularies and ontologies. In a recent article in AI Magazine entitled “Knowledge is power: the view from the Semantic Web” I discuss this a lot more, unfortunately they won’t let me give it away (AAAI members can get a copy from their site). However the slides from the talk I based that paper on are available on the Web, and there’s one in there that I think helps make the case I’m trying to make.
There’s a slide where I show a very small piece of OWL in which I related the definition of “cat” from CYC with the definition of “leukemia” from the NCI ontology to define “feline leukemia” as a leukemia that occurs in cats. I need only define one OWL class, but it links the tens of thousands of concepts in open CYC (including cat facts like cats usually have four legs and eat meat and can be pets) with the tens of thousands of cancer-related concepts in the National Cancer Institutes ontology (like human trials of a specific drug were used on leukemia patients, that leukemia is a neoplastic process, etc). This probably wouldn’t be everything I would want to say about feline leukemia if I was a researcher, but would it save me a lot of knowledge engineering effort.
Of course, there’s a problem here in that having done that, most tools wouldn’t do a great job of handling this knowledge (if they could at all). Not so much because it is too big (scaling goes on as we speak, and it’s nice to see people building RDF stores and OWL reasoners that can handle much bigger ontologies) but because models of how to handle this are still mainly unexplored. The OWL DL semantics is completely braindead with respect to this issue (note that there is a formal objection to the W3C on the “imports” feature of OWL, which I wrote) — it just assumes that external referents don’t mean anything (they’re essentially annotations) if there’s no import statement. But if you assume imports every time you touch this, you lose because once more and more ontologies use terms from more and more vocabularies (and vice versa) there will be a Web of this stuff, and the “imports closure” could be humongous (ideally, the whole Semantic Web!)
Some research has started looking at this. Bernardo Cuenca Grau and Bijan Parsia worked on extending the e-connection framework to work on the Semantic Web (is one of their papers that’s a good starting place for learning their work then go on to Bernardo’s thesis or their various journal papers - some linked to the MINDSWAP page, some which I hope Bernardo will put up on a page at Manchester now that he is working there). But that work is just a starting place, still has a LOT of work for extension (especially heuristic approaches, to really use this stuff we must go way beyond DL and we also need to go “down” to the RDF level and think about what happens as the RDF graph on any given server starts to have more and more links to things on other servers either by direct URI naming or via the “virtual linking” which SPARQL will provide.
In short, I worry that most of the Semantic Web community is doing work in Semantics, most of the rest are looking at Web apps, and hardly anyone is actually looking at the “Semantic Web” that I really care about….

January 28th, 2006 at 8:52 pm
I agree completely. Very little, if any, of the semantics of RDF or OWL are new or innovative. The true innovation is in the network links, the same leap that TBL took with the creation of the web from earlier hypertext systems.
I presented a little on this at the Dublin Core conf. last year: http://research.talis.com/2005/frbr-dc2005/ (slides 22/23 on Grounding Schemas)
January 30th, 2006 at 9:53 pm
Thanks for the slide pointer, I think that a very important subset of OWL (or superset of RDFS, depending how you look at it) is RDFS extended with the “property” properties of OWL - this is the subset primarily used in FOAF, its what you use in those examples, and I think using those to link SKOS stuff to OWL will be particularly interesting in the long run. I did mention some of this in my XML 05 talk (apologies, it’s a big PDF file) and emphasized that one of the differences between the XML view of the world and the Semantic Web view of the world was again sort of a documents vs. links perspective… now if we could get more of our Sem Web buddies to share it