Mindswap Weblog

How many OWL ontologies are there on the Web?

by James Hendler

For a while now I have been using Google to find OWL during my demos.  If one uses a search key and “ext:owl” or “ext:rdf” one can find files with the searched term- and since most OWL users are ignoring the recommendation to use “rdf” as the extension and using “.owl” the ext:owl search tends to work well.  What I started wondering about a while ago, however, was how well this did - Swoogle usually found more documents per term than Google did (but it way more impresses non-SW audiences when you show them things can be found without going to a special engine).  I’ve yet to figure out how to evaluate this formally, but the following seemed like a good starting place - Swoogle says it searches over 10,000 ontologies (although neither the home nor the statistics gives more detail than that) - so I thought I would try to figure out how many Google had.  I tried “ontology ext:owl” figuring that was a good way — and a few months ago it was giving me about 10,000+ returns, so it seemed to concur.  However, all of a sudden sometime in the past few weeks (or at least since I last tried this beginning of summer) the number dropped to several hundreds.  I was pretty sure the OWL files didn’t all go away, so I was worried.  I talked to a friend at Google about how I could get a better count, and he pointed out that the search key does not have to be a positive one - i.e. you can search Google for pages that don’t contain some term - so he suggested the search “-asasasasasa ext:owl” (which produces about 7,000 files today).

That seemed like a good start, but  since the OWL recommendation did not endorse “.owl” and recommended using “.rdf” (something I now think was a mistake, sorry TAG) it’s clear this is an undercount.  The next trick is therefore to figure out how many OWL ontologies are in .rdf files.  There are a lot of RDF files on the web (”-asasasasasa ext:rdf” returns about 1.67M).  I tried “Owl ext:rdf” which returns 22,000 hits - problem is this includes a lot of documents that aren’t actually OWL ontologies (for example, any RDf data living in at a site with “owl” in the URI) and also is non-unique (one ontology may use the term owl many times, esp. as owl:class seems to sometimes be picked up, and sometimes not).

So, if anyone has a good idea how to get a better estimate of how many of the RDF files out there use OWL, or a better way to search for files like the foaf namespace that use OWL terminology in definitions but use the .rdf extension, I’d welcome some suggestions.

-Jim H.

p.s. Oh yeah, I should mention that an obvious solution would be searching for the OWL namespace doc being referred to - this would be great because it is likely to happen only in ontology-related documents and only once per document -unfortunately, Googling for “http://www.w3.org/2002/07/owl” only finds about 70+ hits, which I think is because the namespace declarations appear within the rdf:RDF block, and Google must not search in there…

3 Responses to “How many OWL ontologies are there on the Web?”

  1. UMBC eBiquity Says:

    How many Semantic Web documents are on the Web?…

    ……

  2. Tim Finin Says:

    Interesting post Jim. I like the “-asasasasasa” Google trick — it was new to me. We’ve thought about this and I started a page on the topic some weeks ago, and your post spured me on to finish it off. See “How many Semantic Web documents are on the Web?” [1]. Last month, we took up a related topic, counting “Ontologies on the Semantic Web” [2].

    Tim

    [1] http://ebiquity.umbc.edu/blogger/2006/09/08/how-many-semantic-web-documents-are-on-the-web/
    [2] http://ebiquity.umbc.edu/blogger/2006/08/20/ontologies-on-the-semantic-web/

  3. Kasper van den Berg Says:

    If you have enough computational power and bandwitdh, the following might be an option:
    - Get a selection of documents that possibly are ontologies. Possibly by following Li Ding’s approach.
    - Run an OWL validator these documents.
    - The problem (as discussed on semantic-web@w3.org) of classifying the document as an ontology, only making use of an ontology to define instances, or a combination of both, remains.

    (I estimate classifying all documents will consume some days upto a month.)

Leave a Reply

You must be logged in to post a comment.

MINDSWAP is a W3C member