Mindswap Weblog

How many OWL ontologies are there on the Web?

by James Hendler

For a while now I have been using Google to find OWL during my demos.  If one uses a search key and “ext:owl” or “ext:rdf” one can find files with the searched term- and since most OWL users are ignoring the recommendation to use “rdf” as the extension and using “.owl” the ext:owl search tends to work well.  What I started wondering about a while ago, however, was how well this did - Swoogle usually found more documents per term than Google did (but it way more impresses non-SW audiences when you show them things can be found without going to a special engine).  I’ve yet to figure out how to evaluate this formally, but the following seemed like a good starting place - Swoogle says it searches over 10,000 ontologies (although neither the home nor the statistics gives more detail than that) - so I thought I would try to figure out how many Google had.  I tried “ontology ext:owl” figuring that was a good way — and a few months ago it was giving me about 10,000+ returns, so it seemed to concur.  However, all of a sudden sometime in the past few weeks (or at least since I last tried this beginning of summer) the number dropped to several hundreds.  I was pretty sure the OWL files didn’t all go away, so I was worried.  I talked to a friend at Google about how I could get a better count, and he pointed out that the search key does not have to be a positive one - i.e. you can search Google for pages that don’t contain some term - so he suggested the search “-asasasasasa ext:owl” (which produces about 7,000 files today).

That seemed like a good start, but  since the OWL recommendation did not endorse “.owl” and recommended using “.rdf” (something I now think was a mistake, sorry TAG) it’s clear this is an undercount.  The next trick is therefore to figure out how many OWL ontologies are in .rdf files.  There are a lot of RDF files on the web (”-asasasasasa ext:rdf” returns about 1.67M).  I tried “Owl ext:rdf” which returns 22,000 hits - problem is this includes a lot of documents that aren’t actually OWL ontologies (for example, any RDf data living in at a site with “owl” in the URI) and also is non-unique (one ontology may use the term owl many times, esp. as owl:class seems to sometimes be picked up, and sometimes not).

So, if anyone has a good idea how to get a better estimate of how many of the RDF files out there use OWL, or a better way to search for files like the foaf namespace that use OWL terminology in definitions but use the .rdf extension, I’d welcome some suggestions.

-Jim H.

p.s. Oh yeah, I should mention that an obvious solution would be searching for the OWL namespace doc being referred to - this would be great because it is likely to happen only in ontology-related documents and only once per document -unfortunately, Googling for “http://www.w3.org/2002/07/owl” only finds about 70+ hits, which I think is because the namespace declarations appear within the rdf:RDF block, and Google must not search in there…

3 Responses to “How many OWL ontologies are there on the Web?”

  1. UMBC eBiquity Says:

    How many Semantic Web documents are on the Web?…

    ……

  2. Tim Finin Says:

    Interesting post Jim. I like the “-asasasasasa” Google trick — it was new to me. We’ve thought about this and I started a page on the topic some weeks ago, and your post spured me on to finish it off. See “How many Semantic Web documents are on the Web?” [1]. Last month, we took up a related topic, counting “Ontologies on the Semantic Web” [2].

    Tim

    [1] http://ebiquity.umbc.edu/blogger/2006/09/08/how-many-semantic-web-documents-are-on-the-web/
    [2] http://ebiquity.umbc.edu/blogger/2006/08/20/ontologies-on-the-semantic-web/

  3. Kasper van den Berg Says:

    If you have enough computational power and bandwitdh, the following might be an option:
    - Get a selection of documents that possibly are ontologies. Possibly by following Li Ding’s approach.
    - Run an OWL validator these documents.
    - The problem (as discussed on semantic-web@w3.org) of classifying the document as an ontology, only making use of an ontology to define instances, or a combination of both, remains.

    (I estimate classifying all documents will consume some days upto a month.)

Leave a Reply

MINDSWAP is a W3C member


generic acomplia purchase cialis overnight delivery cheap acomplia online buy generic clomid buy cialis low price viagra without prescription where to buy cialis lowest price levitra where to buy propecia cheap cialis from canada lasix no prescription viagra without rx cheap accutane tablets viagra online without prescription viagra no rx buying cialis online zithromax viagra in uk free cialis cialis us where to buy acomplia find cialis online buy viagra lowest price accutane prescription buy cheap accutane online cialis buy buy generic cialis online acomplia order propecia online lowest price synthroid synthroid without a prescription synthroid online buy propecia online cheap levitra online where to buy levitra cialis online review synthroid prices cialis generic cialis buy drug buy viagra on line viagra pharmacy cialis for order price of levitra zithromax online where to buy synthroid soma generic generic clomid propecia online stores viagra cheap drug cheap generic soma cialis cheap zithromax online cheap order accutane online purchase zithromax online purchase viagra online buy cheap clomid cheap generic propecia zithromax pharmacy online pharmacy cialis cheapest acomplia cost of cialis no prescription viagra free viagra purchase lasix online cialis from india viagra from india order discount cialis soma online stores find no rx cialis cialis no rx required find viagra without prescription approved cialis pharmacy lasix discount