How many OWL ontologies are there on the Web?
For a while now I have been using Google to find OWL during my demos. If one uses a search key and “ext:owl” or “ext:rdf” one can find files with the searched term- and since most OWL users are ignoring the recommendation to use “rdf” as the extension and using “.owl” the ext:owl search tends to work well. What I started wondering about a while ago, however, was how well this did - Swoogle usually found more documents per term than Google did (but it way more impresses non-SW audiences when you show them things can be found without going to a special engine). I’ve yet to figure out how to evaluate this formally, but the following seemed like a good starting place - Swoogle says it searches over 10,000 ontologies (although neither the home nor the statistics gives more detail than that) - so I thought I would try to figure out how many Google had. I tried “ontology ext:owl” figuring that was a good way — and a few months ago it was giving me about 10,000+ returns, so it seemed to concur. However, all of a sudden sometime in the past few weeks (or at least since I last tried this beginning of summer) the number dropped to several hundreds. I was pretty sure the OWL files didn’t all go away, so I was worried. I talked to a friend at Google about how I could get a better count, and he pointed out that the search key does not have to be a positive one - i.e. you can search Google for pages that don’t contain some term - so he suggested the search “-asasasasasa ext:owl” (which produces about 7,000 files today).
That seemed like a good start, but since the OWL recommendation did not endorse “.owl” and recommended using “.rdf” (something I now think was a mistake, sorry TAG) it’s clear this is an undercount. The next trick is therefore to figure out how many OWL ontologies are in .rdf files. There are a lot of RDF files on the web (”-asasasasasa ext:rdf” returns about 1.67M). I tried “Owl ext:rdf” which returns 22,000 hits - problem is this includes a lot of documents that aren’t actually OWL ontologies (for example, any RDf data living in at a site with “owl” in the URI) and also is non-unique (one ontology may use the term owl many times, esp. as owl:class seems to sometimes be picked up, and sometimes not).
So, if anyone has a good idea how to get a better estimate of how many of the RDF files out there use OWL, or a better way to search for files like the foaf namespace that use OWL terminology in definitions but use the .rdf extension, I’d welcome some suggestions.
-Jim H.
p.s. Oh yeah, I should mention that an obvious solution would be searching for the OWL namespace doc being referred to - this would be great because it is likely to happen only in ontology-related documents and only once per document -unfortunately, Googling for “http://www.w3.org/2002/07/owl” only finds about 70+ hits, which I think is because the namespace declarations appear within the rdf:RDF block, and Google must not search in there…

September 8th, 2006 at 6:38 pm
How many Semantic Web documents are on the Web?…
……
September 8th, 2006 at 6:48 pm
Interesting post Jim. I like the “-asasasasasa” Google trick — it was new to me. We’ve thought about this and I started a page on the topic some weeks ago, and your post spured me on to finish it off. See “How many Semantic Web documents are on the Web?” [1]. Last month, we took up a related topic, counting “Ontologies on the Semantic Web” [2].
Tim
[1] http://ebiquity.umbc.edu/blogger/2006/09/08/how-many-semantic-web-documents-are-on-the-web/
[2] http://ebiquity.umbc.edu/blogger/2006/08/20/ontologies-on-the-semantic-web/
September 11th, 2006 at 7:55 am
If you have enough computational power and bandwitdh, the following might be an option:
- Get a selection of documents that possibly are ontologies. Possibly by following Li Ding’s approach.
- Run an OWL validator these documents.
- The problem (as discussed on semantic-web@w3.org) of classifying the document as an ontology, only making use of an ontology to define instances, or a combination of both, remains.
(I estimate classifying all documents will consume some days upto a month.)