CSCI 8351 Assignment 3 (due Feb. 5, 2004)                                                               Yin Xiong

 

TA’s notes:

 

In case you were missing some parts of the assignment Dr. Sheth gave today, here it is again.

Take a look at the following ontologies or knowledge representations: WordNet, GO (Gene Ontology)  CYC, and CIDOC/CRM ontology (http://jodi.ecs.soton.ac.uk/Articles/v04/i01/Doerr/ and http://www.rlg.org/events/metadata2002/gill/sld001.htm ). 

 

 

 

1. Identify one or more key papers/references/presentations about each of the ontologies

 

(To grader: I jotted down some notes when I read along in order to capture the main ideas and remember them. Please don’t be troubled by the notes. )

 

WordNet:  

 

Alberto J, Canas et al. “Using WordNet for Word Sense Disambiguation to Support Concept Map Construction.

 (http://cmap.ihmc.us/Publications/ResearchPapers/SPIRE-2003%20%20Using%20Wordnet%20for%20Word%20Sense%20Disambiguation%20to%20Support%20Concept%20Map%20Construction.pdf

 

 

GO:

Introduction to Gene Ontology”. http://www.geneontology.org/GO.doc.html

 

  1. The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. GO is not a database of gene sequences, nor a catalog of gene products. Rather, GO describes how gene products behave in a cellular context.

GO terms are organized in structures called directed acyclic graphs (DAGs), which differ from hierarchies in that a 'child' (more specialized term) can have many 'parents' (less specialized terms).

 

 

Berry Smith et al. “ The Ontology of Gene Ontology”.

http://ontology.buffalo.edu/medo/Gene_Ontology.pdf

 

P.V. Ogren et al. “ The compositional structure of gene ontology terms”.

http://www-smi.stanford.edu/projects/helix/psb04/ogren.pdf

 

 

CYC:

Cycorp. “The CYC knowledge base.”

http://www.cyc.com/cyc/technology/whatiscyc_dir/whatsincyc

The Cyc knowledge base (KB) is a formalized representation of a vast quantity of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday life. The medium of representation is the formal language CycL.The Cyc KB contains nearly two hundred thousand terms and several dozen hand-entered assertions about/involving each term.

 

Tom O’Hara et. al. “Inducing criteria for mass noun lexical mappings using the Cyc KB, and its extension to WordNet”

http://www.cyc.com/doc/white_papers/inducing-criteria-for-mass.pdf

 

Abstract: This paper presents an automatic approach for learning semantic criteria for the mass vs. count noun distinction by inducing over the lexical mappings contained in the Cyc knowledge base. This produces accurate results (89.5%) using a decision tree that only inforporates semantic features (i.e. Cyc ontological types). Comparable results (86.9%) are obtained using OpenCyc, the publicly available version of Cyc. For broader applicability, the mass noun criteria using Cyc are converted into criteria using WordNet, preserving the general accuracy (86.3%).

 

CIDOC/CRM:  (The International Committee for Documentation of the International Council of Museums, CRM: Conceptual Reference Model )

 

 

International Council of Museums. “ CIDOC Conceptual Reference Model”.

http://cidoc.ics.forth.gr/index.html

 

The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.

The CIDOC CRM is intended to promote a shared understanding of cultural heritage information by providing a common and extensible semantic framework that any cultural heritage information can be mapped to. It is intended to be a common language for domain experts and implementers to formulate requirements for information systems and to serve as a guide for good practice of conceptual modeling. In this way, it can provide the "semantic glue" needed to mediate between different sources of cultural heritage information, such as that published by museums, libraries and archives.

Martin Doerr. “ Mapping a Data Structure to the CIDOC Conceptual Reference Model”.

http://cidoc.ics.forth.gr/docs/mapping.ppt

 

2. classify them in terms of

o        General-Purpose, Domain, Task ontology

o        formal, semi-formal, informal

 

WordNet: General-purpose, semi-formal

 

GO: Domain specific, semi-formal

 

CYC: General-purpose, formal

 

CIDOC/CRM: Domain specific, informal

 

Uschold and Gruninger proposed “four somewhat arbitrary points along what might be thought of as a continuum”:

 

  • highly informal: expressed loosely in natural language
  • semi-informal: expressed in a restricted and structured form of natural language, greatly increasing clarity by reducing ambiguity
  • semi-formal: expressed in an artificial formally defined language
  • rigorously formal: meticulously defined terms with formal semantics, theorems and proofs of such properties as soundness and completeness

 

They believe that “the formality required from the language for the ontology is to a large extent dependent on the degree of automation in the various tasks which the ontology is supporting. If an ontology is a framework for communication among people, then the representation of the ontology can be informal, as long as it is precise and captures everyone’s intuitions. However, if the ontology is to be used by software tools or intelligent agents, then the semantics of the ontology must be made much more precise.” So the degree of formalization depends on the operationalization needs.

 

Uschold, M., & Gruninger, M. (1996). Ontologies: Principles, Methods and Applications. The Knowledge Engineering Review, V. 11, N.2, 1996.

 

 

3. Compare or comment on their expressiveness, and identify which relations they use/express formally or informally?
(Remember, OWL basically just knows ‘is_a’ and ‘instance_of’; some of the above will be more expressive, some less expressive)

 

-          WordNet: more expressive than OWL because it can also express relations (for nouns) such as:

 

n       synonyms/related noun

n       antonyms

n       a value of

n       domain

n       familiarity

 

And for adjectives, WordNet can express relations such as:

 

n       synonyms ordered by frequency

n       coordinate terms

n       hypernyms (pregnancy is a kind of…)

n       hyponyms (… is a kind of pregnancy), brief

n       hyponyms (… is a kind of pregnancy), full

n       Meronyms (parts of pregnancy)

n       Domain terms

n       Familiarity

 

-          GO: at least as expressive as OWL. GO terms are organized in structures called directed acyclic graphs (DAGs), which differ from hierarchies in that a 'child' (more specialized term) can have many 'parents' (less specialized terms). So it can express “is_a” relation with multiple inheritances.

 

-          Cyc: more expressive because it can represent more than OWL can. It is designed to allow the representation of the objective world and allow the representation of agents' beliefs. For example, Cyc knows that Paris is in France (objective) but one can also represent the notion that someone thinks Paris is under water (subjective).

 

-          CIDOC/CRM: less expressive. It relies on XML, so they specify DTD to facilitate mapping between different data sets. It’s not able to express is-A in a hierarchical sense. By mapping, it may be able to express is-A in the sense of “equivalent”. For example, A in one schema “is a” B in another schema. But it’s not in the sense of A is a subclass of B or B inherits properties from A.

-           

 

  1. Identify toy or real-world applications (if any) of the above and identify the context in which they are used

 

-          WordNet:  ArchiWordNet  (http://www.fi.muni.cz/gwc2004/pres/101/ )

 

A bilingual English/Italian thesaurus for the Architecture and Construction domain. It’s structured according to the WordNet model and fully integrated with MultiWordNet which is a multi-lingual lexical database in which the Italian WordNet is strictly aligned with Princeton’s English WordNet.

 

-          GO: GOA (The Gene Ontology Annotation (GOA)) http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12654719&dopt=Abstract


Gene Ontology Annotation (GOA) is a project run by the European Bioinformatics Institute (EBI) that aims to provide assignments of terms from the Gene Ontology (GO) resource to gene products in a number of its databases (http://www.ebi.ac.uk/GOA). In the first stage of this project, GO assignments have been applied to a data set representing the complete human proteome by a combination of electronic mappings and manual curation. This vocabulary has also been applied to the nonredundant proteome sets for all other completely sequenced organisms as well as to proteins from a wide range of organisms where the proteome is not yet complete.

 

-          Cyc: CycSecure http://www.cyc.com/cyc/applications/cycsecure

CycSecure is a security risk analysis tool that capitalizes on the power and richness of the Cyc Knowledge Base and reasoning system in order to provide the network security professional with analyses of an organization's network vulnerabilities at several levels.

CycSecure was developed to address the dramatic increase in security breaches and network vulnerabilities worldwide. This technology enables CycSecure to provide an organization with a virtual representation of its networks that allows attack and simulation modeling to occur without risking damage to, or overload of, the real network. It provides the network security team information that affords complete, concise risk analysis.

 

 

-          CIDOC/CRM: Image Annotation using CIDOC/CRM and MPEG-7

(Proposed project)

Domain-specific ontologies have been developed by two different ISO Working Groups to standardize the semantics associated with the description of museum objects (CIDOC Conceptual Reference Model) and the description of multimedia content (MPEG-7) - but no single ontology or metadata model exists for describing museum multimedia content. This paper describes an approach which combines the domain-specific aspects of MPEG-7 and CIDOC-CRM models into a single ontology for describing and managing multimedia in museums. The result is an extensible model which could lead to a common search interface and the open exchange, sharing and integration of heterogeneous multimedia resources distributed across cultural institutions.

 

 

  1. Would you consider the ontology successful? Try to give reasons for success or failure.

 

-          WordNet: it’s a successful ontology if we define ontology as “specification of conceptualization”. Considering that the core part of the English language is the small words (everyday words) and they are the ones that have multiple meanings, I’d say WordNet is quite a job. For example, those long words in a specific domain usually have a single or unique meaning. But a small word as “big” has at least 14 senses (by WordNet), and each of which has a lot of synonyms. This is the hardest part for semantic disambiguation. WordNet did a good job in this respect.

 

-          GO: I would consider it successful. It provides “controlled vocabulary” so data from different databases can be integrated, queried, and mapped easily. Each gene or gene product has a list of associated GO terms. Each database also publishes a table of these associations. GO mainly captures “isA” relationship. As biology terms are structured mostly in isA relation, GO captured the most important structure successfully.

 

 

-          Cyc: I think it’s quite successful. It was designed to be a knowledge database, so naturally It paid more attention to its inference rules than the rest. Cycorp designed quite a few ontologies for specific area such as transportation and Cyc itself. (http://www.daml.org/ontologies/submitter.html).

 

 

-          CIDOC/CRM: I wouldn’t categorize it as ontology in the strict sense. It can not express “isA” relation or other relations. It’s more about how to map database schemas. But it’s quite successful as a means to facilitate information retrieval and integration in the museum community.