C. Project Description C. 1 Introduction


C.5 Results From Prior NSF Support



Yüklə 365,5 Kb.
səhifə4/5
tarix28.07.2018
ölçüsü365,5 Kb.
#59219
1   2   3   4   5

C.5 Results From Prior NSF Support


Work on three recent NSF grants is relevant to this project.

BIR 96-30316: Database Support for Shared Ecological Research Sites
(Database Activities Program) 1996-1997, $168,000.

Drs. Cushing, Delcambre, and Maier (computer scientists) teamed with Drs. Nadkarni and Franklin (forest canopy scientists) to investigate database support for forest canopy data.

This project devised the foundation data and developed and evaluated a data management system for the Wind River Canopy Crane Research Facility (WRCCRF). This project also supported two focused research studies on the transfer of data among participating individual researchers.

Computer scientists worked closely with domain scientists throughout the research process from research design to data archiving. We also developed a web-based browser for inter-related research projects concerning global warming.

Papers citing this grant address forest canopy science [1, 2, 3, 4] as well as computer science support for forest canopy science [5, 6, 7, 8].

IRI-9502084: Content-Based Connections for Navigating on the NII
L. Delcambre and D. Maier, 1995-96, $50,000.

This project explored the use of superimposed information models to provide new, semantically meaningful access to an underlying universe of information. The work resulted in the definition of Structured Maps to provide domain-specific concepts and relationships over an underlying information set consisting of HTML and SGML documents with a series of prototype implementations. Selected papers cite this award [9, 10, 11, 12].



IIS-9817492 Tracking Footprints Through an Information Space: Leveraging the Document Selections of Expert Problem Solvers (Digital Libraries Phase 2)
P. Gorman, L. Delcambre, D. Maier, 1999-2001, $650,000.

This new project is exploring the use of another form of superimposed information using the patient medical record as the base information. The premise of this project is that when experts are engaged in a problem solving activity (e.g., diagnosing a patient), there is knowledge implicit in their selection of documents. Drs. Delcambre and Maier are working with Paul Gorman, M.D., at the Oregon Health Sciences University. Earlier collaboration of this team concerned the introduction of superimposed information to address the familiarization problem [13].


C.6 Related Work


We survey two kinds of related work: current approaches to providing information access and relevant projects in the scientific domains. The current approaches are listed in Table 4, along with the relevant bibliographic reference. Each approach is briefly described.

Table 4: Current Approaches to Providing Information Access

Approach

Description

BioMedNet [14]

Stores and manages biological and medical documents and resources on-line. Documents are annotated by category and rated. Supports simple browsing by category as well as text search.

BUBL [15]

Provides an interface (search engine) to a catalogue of academic subject areas. Searching through subject terms accesses items in the catalogue.

Committee on Earth Observation Satellites International Directory Network (CEOS IDN) [16]

Helps researchers locate information on available data sets within the Global Change Master Directory (GCMD). Consists of four coordinating distribution sites, each of which holds a complete copy of the GCMD and are updated monthly.

Clearinghouse Mechanism of the Convention on Biological Diversity (CHM) [17]

Provides searches based on themes, organizations, and documents over a URL Database. Uses BIOSEEK for finding information over the internet.

CoNote [18]

Allows a group of people using a WWW browser to communicate using shared annotations.

Cooperative Online Resource Catalog (CORC) [19]

Explores the cooperative creation and sharing of metadata by libraries. Provides tools for link maintenance and cataloging as well as a database system.

DC-ROADS [20]

Uses a ROADS database (see below) and allows dynamic updates of Dublin Core [21] metadata.

Environmental Information Management System (EIMS) [22]

Stores, manages, and delivers metadata for data sets, databases, documents, models, projects, and spatial information. Stores and maintains most information, including metadata, in a relational DBMS.

Federal Geographic Data Committee (FGDC) Clearinghouse [23]

Acts as a distributed discovery mechanism for digital geospatial data. Provides a detailed catalog service for FGDC Digital Geospatial Metadata.

Forest Conservation Archives & Portal [24]

Uses a relational database to store and access (through basic text search) information on forest conservation issues.

Global Environmental Locator Service (GELOS) [25]

Allows search through text, geographical information, and keywords over a distributed information locator service for environment and natural resources. Allows registered users to contribute and edit metadata entries.

Global/Government Information Locator Service (GILS) [26, 27]

Specifies a standard way to share locator information across organizations.

Global Forest Information Service (GFIS) [28]

Details a desired set of characteristics for a network of distributed information resources. No system has been implemented, to date.

HTTP Based Geo-Temporal Searching Protocol (HGS) [29]

Performs interoperable and simple database searches over the web.

Long Term Ecological Research Network (LTERnet) [30, 31, 32]

Provides central access to information derived from individual LTER site databases. Consists of a set of tools to support interoperable search and discovery over the databases.

Mountain Forum [33]

Searchable on-line library and database for environmental issues.

National Biological Information Infrastructure (NBII) [34]

Helps locate, evaluate, and access biological data and information from a distributed network of information sources.

National Environmental Data Index (NEDI) [35]

Acts as a search engine service that supports spatial, full-text, keyword, and fielded search over specific government information repositories.

Open Directory [36]

Provides a search engine similar to Yahoo that allows the general public to edit metadata entries. Uses RDF to store metadata information.

PowerBookmarks [37]

Organizes, discovers, maintains, and shares bookmark information through an information management system.

Resource Organization and Discovery in Subject-based Services (ROADS) [38]

Helps set up a web-accessed database of resources relevant to a specific subject area. Searching and browsing can be performed over the database. Handles tasks such as resource discovery, link maintenance, validation, and statistics collection.

Strudel [39, 40, 41]

Supports querying and restructuring web sites.

Third Voice [42]

Allows anyone using the service to post notes (annotations) onto a web page. Supports public, private, or group access.

These approaches to information access can be categorized as either centralized or decentralized. Of the approaches in Table 4, CEOS IDN, DC-ROADS, EIMS, FGDC Clearinghouse, GELOS, GFIS, GILS, NBII, Power Bookmarks, and ROADS are decentralized systems. Of these decentralized approaches, some either replicate entire collections of information at various sites across the Internet or use distributed sites to act as large information repositories. We are calling these types of systems Heavyweight Nodes, in Table 5. Other approaches store only metadata across sites; still others store both metadata and data across sites. Finally, some approaches do not specify the decentralized storage requirements for individual sites. The categorization of the decentralized approaches using these four possibilities is shown in Table 5.

Table 5: Categorizing the Decentralized Information Access Approaches

Heavyweight Nodes

Metadata Only

Metadata and Data

Not Specified

EIMS

FGDC Clearinghouse

NBII

GILS

ROADS

GELOS

GFIS




DC-ROADS




Power Bookmarks




CEOS IDN










Many current approaches use fixed metadata content. However, some approaches allow for new metadata elements to be added to the standard metadata elements of the system, while other approaches have no standard elements and allow for any arbitrary metadata elements. We will use recognized metadata standards and allow for extensible metadata content, when appropriate, to set the terminology that we use as an access structure. Thus we summarize how these different approaches use various metadata standards and each approach’s metadata scope, in Table 6.

Table 6: Metadata Usage of Current Information Access Approaches

Approach

Metadata Scope

Standards Based

BioMedNet

Fixed

No

CEOS IDN

Fixed

Yes – DIF

CoNote

Fixed

No

CORC

Arbitrary

Yes – Dublin Core, MARC

DC-ROADS

Fixed

Yes – Dublin Core

EIMS

Fixed

Yes – FGDC + their own

FGDC Clearinghouse

Fixed

Yes – FGDC

GELOS

Fixed

No

GFIS

Not Specified

Yes – a goal of GFIS is to use existing standards

GILS

Extensible

Yes – Their own Core Set of Elements

LTERnet

Extensible

Yes – Their own standard, the LTER metadata content

NBII

Fixed

Yes – FGDC + their own

NEDI

Fixed

Yes – GILS, DIF, FGDC

Open Directory

Fixed

No

Power Bookmarks

Fixed

No

ROADS

Arbitrary

No

Third Voice

Fixed

No

Finally, the different services offered by current approaches can be divided into three general categories: locator services, search engine services, and information management systems. A Locator Service is an information and resource retrieval  and possibly information update  facility, that offers no other services. An Information Management System is typically a Locator Service along with additional services such as metadata management and discovery as well as different types of search capabilities. A Search Engine Service is a resource retrieval facility that can consist of metadata search capabilities, interoperable search, and simple keyword search for centralized data. Our proposed system will be a locator service with advanced search engine services. Currently, there are no additional services planned that would move the system into a full information management system. Table 7 shows the services offered by current approaches.

Table 7: System Classification – Locator vs. Search Engine vs. Info. Mgt. System

Locator Services

Search Engine Services

Information Management Systems

CEOS IDN

BioMedNet

CoNote

DC ROADS

Bubl

CORC

GELOS

CHM

EIMS

GILS

HGS

FGDC Clearinghouse

ROADS

Forest Conservation

GFIS




LTERnet

NBII




Mountain Forum

Power Bookmarks




NEDI

Strudel




Open Directory

Third Voice

Next we survey related research papers. We include illustrative efforts, without attempting an exhaustive survey.

Multivalent Annotations [43] and CoNote both use annotations as superimposed information over digital documents. PowerBookmarks [Error: Reference source not found] allow bookmark annotations, bookmark organization, query, and navigation on collections of bookmarks.

Several systems attempt to extract structure (schema) from Web pages in order to perform more advanced queries. Nestorov, et al. [44] define techniques to determine the regularity of semi-structured data. Another approach defines [45] methods for classifying Web pages based on hypertext link information are defined. PageRank [46] uses hypertext links to determine possible rankings of importance for Web pages, similar to Google [47] search engine. Atzeni et al. [48] describe a method for structuring Web pages in order to manage Web data.

Other approaches to supporting more advanced Web searching techniques include WebSQL [49], a high-level declarative query language for the web. Manber and Bigot [50] introduce a two-phase search that finds a specific subject database and then performs a context search on that database. In another effort [51], a rudimentary form of structure search is used on the results of a context search in order to reduce the number of irrelevant addresses returned from search engines. IRISWeb [52] retrieves, indexes, and searches a set of user defined web pages called virtual collections.

Shum et al. [53] introduces an advanced metadata description scheme similar to Topic Maps [54]. Klavans [55] suggests the need for dynamically updating and modifying ontologies. Weinstein and Alloway [56] propose an extensible ontology system and discuss the problems in reaching agreement on fixed metadata standards due to complex data and user needs.

Ceri et al. [57] define ten general principles that should be considered when implementing a Web site managing large amounts of information, some of which are relevant to the design of Adaptive Management Portal. Malaika [58] describes accessing and querying relational data using web techniques along with some of the problems that exist in using relational databases on the web.

The Extensible Markup Language (XML) standard [59] is a restricted form of SGML for information representation and data interchange over the Internet. The Resource Description Framework (RDF) standard [60] is a simple data model that attaches metadata in the form of properties and values to Internet resources. XML Linking Language (XLink) [61] describes a format for defining links between objects in XML. A Topic Map [Error: Reference source not found] is a standard that defines a data model in which topics and topic associations are used as an index over a set of documents.

A number of attempts have been made to create vocabularies, glossaries, and dictionaries for forestry issues, e.g., sustainable forestry [62], forest products [63, 64], climate issue [65, 66], ecosystem management [67, 68], and biodiversity [69].

The goals for our research are similar to many existing systems: to provide improved access to information using appropriate metadata. The distinctions lie in our approach. Rather than build a specialized repository for forest information, e.g., using database technology, we embrace the Internet as an information dissemination platform. Most importantly we introduce additional access structures in a superimposed layer, without requiring any modification to the underlying base information. This aspect is particularly important for applications, such as forest management, with data from decades of research and experimentation that may still be of high value for current problems, e.g., results of various silvacultural treatments. A distinguishing aspect of our approach is that the superimposed layer provides information access capabilities that are difficult to deliver using just the base data in isolation. Examples are structured query, navigation from the base layer through the superimposed layer back to the base layer, and definition of “virtual documents” containing information elements from multiple sources.

We will adapt ideas from Topic Maps [Error: Reference source not found] and ontologies for our superimposed information model and we propose to implement using standards such as XML [Error: Reference source not found] or RDF [Error: Reference source not found]. Our work extends Topic Maps by including a query capability over the base and superimposed information simultaneously. Our work goes beyond XML and RDF by defining a higher level model, based on Topic Maps. Another significant aspect or our work is the focus on referencing base information at multiple levels of granularity, e.g., phrases, sentences, paragraphs, etc.

A significant portion of our superimposed information will be based around domain terminology, as with a Topic Map. We expect our superimposed information to be less formal and rigid than a semantic ontology – we are not as concerned with the nuances of meaning as we are with using terminology that is commonly understood among portal users. We will attempt to utilize existing metadata or terminology standards, as appropriate, e.g., the Dublin Core [Error: Reference source not found]. While we do intend to support some form(s) of geospatial access, we will concentrate on multiple, thematic organizations of information, complementary to a spatial organization, because many interesting queries do not require spatial reference.

We admit and distinguish curated and non-curated superimposed information. We envision that the choice of terminology used for access purposes will be curated. That is, some authority such as the standards organization that approved the vocabulary or local sites providing information will control the vocabulary. We also allow interconnections and annotations to be either curated or non-curated. Curated information is approved or acknowledged by some authority. For example, the relationship between a watershed restoration plan and the legal activities associated with its enforcement might be curated by the agency responsible for the plan. But we also believe that non-curated, open information provided by any user of the Adaptive Management Portal will also have high value, as it offers the possibility to reuse the “attention” of people who have discovered interesting linkages or supplied insightful commentary.

We do not intend for there to be a rigid mapping between a portal and a particular site. Rather, we expect administrators and users to be able to customize a portal to achieve various vantagepoints of interest that might include subsets from various sites.


Yüklə 365,5 Kb.

Dostları ilə paylaş:
1   2   3   4   5




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə