CSCI 8351 Assignment 3 (due
TA’s notes:
In case you were missing some parts of the assignment Dr. Sheth gave
today, here it is again.
Take a look at the following ontologies or knowledge representations:
WordNet, GO (Gene Ontology) CYC, and
CIDOC/CRM ontology (http://jodi.ecs.soton.ac.uk/Articles/v04/i01/Doerr/
and http://www.rlg.org/events/metadata2002/gill/sld001.htm
).
1. Identify one or more key
papers/references/presentations about each of the ontologies
(To grader: I jotted
down some notes when I read along in order to capture the main ideas and
remember them. Please don’t be troubled by the notes. )
WordNet:
Alberto J, Canas et al. “Using WordNet for Word Sense Disambiguation
to Support Concept Map Construction.
GO:
“Introduction to Gene Ontology”. http://www.geneontology.org/GO.doc.html
GO terms are organized in structures called directed
acyclic graphs (DAGs), which differ from hierarchies in that a 'child'
(more specialized term) can have many 'parents' (less specialized terms).
Berry Smith et al. “ The Ontology of Gene Ontology”.
http://ontology.buffalo.edu/medo/Gene_Ontology.pdf
P.V. Ogren et al. “ The compositional structure of gene ontology
terms”.
http://www-smi.stanford.edu/projects/helix/psb04/ogren.pdf
CYC:
Cycorp. “The CYC knowledge base.”
http://www.cyc.com/cyc/technology/whatiscyc_dir/whatsincyc
The Cyc knowledge base (KB)
is a formalized representation of a vast quantity of fundamental human
knowledge: facts, rules of thumb, and heuristics for reasoning about the
objects and events of everyday life. The medium of representation is the
formal language CycL.The
Cyc KB contains nearly two hundred thousand terms and several dozen
hand-entered assertions about/involving each term.
Tom O’Hara et. al. “Inducing criteria for mass noun lexical mappings
using the Cyc KB, and its extension to WordNet”
http://www.cyc.com/doc/white_papers/inducing-criteria-for-mass.pdf
Abstract: This paper presents an
automatic approach for learning semantic criteria for the mass vs. count noun
distinction by inducing over the lexical mappings contained in the Cyc
knowledge base. This produces accurate results (89.5%) using a decision tree
that only inforporates semantic features (i.e. Cyc ontological types). Comparable
results (86.9%) are obtained using OpenCyc, the publicly available version of
Cyc. For broader applicability, the mass noun criteria using Cyc are converted
into criteria using WordNet, preserving the general accuracy (86.3%).
CIDOC/CRM: (The International
Committee for Documentation of the International Council of Museums, CRM:
Conceptual Reference Model )
International Council of Museums. “ CIDOC Conceptual Reference Model”.
http://cidoc.ics.forth.gr/index.html
The CIDOC Conceptual Reference Model (CRM)
provides definitions and a formal structure for describing the implicit and
explicit concepts and relationships used in cultural heritage documentation.
The CIDOC CRM is intended to promote a
shared understanding of cultural heritage information by providing a common
and extensible semantic framework that any cultural heritage information
can be mapped to. It is intended to be a common language for domain
experts and implementers to formulate requirements for information systems and
to serve as a guide for good practice of conceptual modeling. In this way, it
can provide the "semantic glue" needed to mediate between
different sources of cultural heritage information, such as that published by museums,
libraries and archives.
Martin Doerr. “ Mapping a Data Structure to the CIDOC Conceptual
Reference Model”.
http://cidoc.ics.forth.gr/docs/mapping.ppt
2. classify them in terms of
o
General-Purpose, Domain, Task ontology
o
formal, semi-formal, informal
WordNet: General-purpose, semi-formal
GO: Domain specific, semi-formal
CYC: General-purpose, formal
CIDOC/CRM: Domain specific, informal
Uschold and Gruninger proposed “four somewhat arbitrary
points along what might be thought of as a continuum”:
They
believe that “the formality required from the language for the ontology is to a
large extent dependent on the degree of automation in the various tasks which
the ontology is supporting. If an ontology is a framework for communication
among people, then the representation of the ontology can be informal, as long
as it is precise and captures everyone’s intuitions. However, if the ontology
is to be used by software tools or intelligent agents, then the semantics of
the ontology must be made much more precise.” So the degree of formalization
depends on the operationalization needs.
Uschold, M., & Gruninger, M. (1996). Ontologies:
Principles, Methods and Applications. The Knowledge Engineering Review, V. 11,
N.2, 1996.
3. Compare or comment on their expressiveness, and
identify which relations they use/express formally or informally?
(Remember, OWL basically just knows ‘is_a’ and ‘instance_of’; some of the above
will be more expressive, some less expressive)
-
WordNet: more expressive than OWL
because it can also express relations (for nouns) such as:
n
synonyms/related
noun
n
antonyms
n
a
value of
n
domain
n
familiarity
And for adjectives, WordNet can
express relations such as:
n
synonyms
ordered by frequency
n
coordinate
terms
n
hypernyms
(pregnancy is a kind of…)
n
hyponyms
(… is a kind of pregnancy), brief
n
hyponyms
(… is a kind of pregnancy), full
n
Meronyms
(parts of pregnancy)
n
Domain
terms
n
Familiarity
-
GO: at least as expressive as OWL. GO terms are organized in structures
called directed acyclic graphs (DAGs), which differ from hierarchies in
that a 'child' (more specialized term) can have many 'parents' (less
specialized terms). So it can express “is_a” relation with multiple
inheritances.
-
Cyc: more expressive because it can
represent more than OWL can. It is designed to allow the representation of the objective world and
allow the representation of agents' beliefs. For example, Cyc knows that
-
CIDOC/CRM: less expressive. It
relies on XML, so they specify DTD to facilitate mapping between different data
sets. It’s not able to express is-A in a hierarchical sense. By mapping, it may
be able to express is-A in the sense of “equivalent”. For example, A in one
schema “is a” B in another schema. But it’s not in the sense of A is a subclass
of B or B inherits properties from A.
-
-
WordNet: ArchiWordNet
(http://www.fi.muni.cz/gwc2004/pres/101/
)
A bilingual English/Italian
thesaurus for the Architecture and Construction domain. It’s structured
according to the WordNet model and fully integrated with MultiWordNet which is
a multi-lingual lexical database in which the Italian WordNet is strictly
aligned with
-
GO:
Gene Ontology Annotation (GOA) is a project run by the European Bioinformatics
Institute (EBI) that aims to provide assignments of terms from the Gene
Ontology (GO) resource to gene products in a number of its databases
(http://www.ebi.ac.uk/GOA). In the first stage of this project, GO assignments
have been applied to a data set representing the complete human proteome by a
combination of electronic mappings and manual curation. This vocabulary has
also been applied to the nonredundant proteome sets for all other completely
sequenced organisms as well as to proteins from a wide range of organisms where
the proteome is not yet complete.
-
Cyc: CycSecure http://www.cyc.com/cyc/applications/cycsecure
CycSecure
is a security risk analysis tool that capitalizes on the power and richness of
the Cyc Knowledge Base and reasoning system in order to provide the network
security professional with analyses of an organization's network
vulnerabilities at several levels.
CycSecure
was developed to address the dramatic increase in security breaches and network
vulnerabilities worldwide. This technology enables CycSecure to provide an
organization with a virtual representation of its networks that allows attack
and simulation modeling to occur without risking damage to, or overload of, the
real network. It provides the network security team information that affords
complete, concise risk analysis.
-
CIDOC/CRM: Image Annotation using CIDOC/CRM
and MPEG-7
(Proposed
project)
Domain-specific
ontologies have been developed by two different ISO Working Groups to
standardize the semantics associated with the description of museum objects
(CIDOC Conceptual Reference Model) and the description of multimedia content
(MPEG-7) - but no single ontology or metadata model exists for describing
museum multimedia content. This paper describes an approach which combines the
domain-specific aspects of MPEG-7 and CIDOC-CRM models into a single ontology
for describing and managing multimedia in museums. The result is an
extensible model which could lead to a common search interface and the open
exchange, sharing and integration of heterogeneous multimedia resources
distributed across cultural institutions.
-
WordNet:
it’s a successful ontology if we define ontology as “specification of
conceptualization”. Considering that the core part of the English language is
the small words (everyday words) and they are the ones that have multiple
meanings, I’d say WordNet is quite a job. For example, those long words in a
specific domain usually have a single or unique meaning. But a small word as
“big” has at least 14 senses (by WordNet), and each of which has a lot of
synonyms. This is the hardest part for semantic disambiguation. WordNet did a
good job in this respect.
-
GO: I would
consider it successful. It provides “controlled vocabulary” so data from
different databases can be integrated, queried, and mapped easily. Each gene or
gene product has a list of associated GO terms. Each database also publishes a
table of these associations. GO mainly captures “isA” relationship. As biology
terms are structured mostly in isA relation, GO captured the most important
structure successfully.
-
Cyc: I think
it’s quite successful. It was designed to be a knowledge database, so naturally
It paid more attention to its inference rules than the rest. Cycorp designed
quite a few ontologies for specific area such as transportation and Cyc itself.
(http://www.daml.org/ontologies/submitter.html).
-
CIDOC/CRM: I
wouldn’t categorize it as ontology in the strict sense. It can not express
“isA” relation or other relations. It’s more about how to map database schemas.
But it’s quite successful as a means to facilitate information retrieval and
integration in the museum community.