More than Pharmocopaeia: Biomedical Resource Discovery in Context

D-Lib Magazine
September 1998

ISSN 1082-9873

More than Pharmocopaeia
Biomedical Resource Discovery in Context

John Kirriemuir
OMNI: Organising Medical Networked Information
Greenfield Medical Library, Queens Medical Centre
Nottingham NG7 2UH, England
jk@omni.ac.uk
http://omni.ac.uk/

Sue Welsh
IT consultant
212, High Street, Tonbridge
Kent, England
sue@braybrook.u-net.com

Introduction

This paper takes a look at some of the issues surrounding biomedical resource discovery. It begins with a brief overview of the components of a typical biomedical resource discovery system; more considered attention is given to MeSH and UMLS.

The main body of this paper deals with the issues surrounding the contexts into which a health/medical resource discovery system, such as OMNI, fits. There are three main contexts, namely:

the geographical focus of the system; in other words, which regions of the world do the resources catalogued in the gateway actually reside in.

the audience of the system; what type of people is the system provided for -- and what type of people actually use it.

the subject fit of the system; how does it complement resource discovery systems that cover other subject areas.

The paper then concentrates on "the trinity of efficiency", namely, caching, mirroring, and resource discovery, and indicates a few ways in which they can complement each other in order to provide a more holistic resource discovery and resource acquisition system. The paper concludes with a brief look at the "stuff" that really concerns end-users, namely primary content.

This paper does not present an in-depth comparison of health/medical resource systems. People are directed to the detailed survey carried out by a group of information scientists/librarians [megasites], which examined and compared over 20 such systems. In addition, we do not discuss technical matters, such as Dublin Core [core] or the Resource Discovery Framework[rdf], here, though more material and papers on these topics as applied to the biomedical and health fields would be welcome. Issues relating to quality, such as evaluation criteria, are also covered elsewhere in some detail [agec].

A side point. Many of the concepts and ideas mentioned in this paper can be applied to other subject areas, such as the social sciences, arts, engineering, and business and economics. These areas also possess quality resource discovery systems, as will be referenced later.

MeSH and UMLS in context

As with most other conventional resource discovery systems, your typical high quality biomedical system usually consists of:

an underlying database of catalogued entries, each entry pertaining to a particular biomedical resource such as a Web site, document, page or image.

facilities to search the database.

facilities to browse the database.

various other value added services, such as a searchable version of Medlink, or specialised subject guides or in-house patient information.

Facilities to search resource catalogues vary from system to system. Medfinder [medfinder], for example, allows people to search across descriptions of resources of particular types, such as news articles or directories of links. OMNI [omni], allows people to search across a combination of databases, each database specialising in either a geographical region (i.e., the world, or just the UK), or a particular type of medical resource, e.g., nursing, or dental images.

Similarly, facilities to browse resource catalogues vary between systems. For example, MedWeb [medweb] allows people to browse a geographically oriented tree, so resources that are relevant to a particular region, or even a city, can be identified. OMNI provides three different types of browsing, namely:

via an alphabetically arranged tree of NLM section names,

via a section number ordered tree of NLM section names, and

via MeSH, which is also used by several other biomedical resource discovery systems.

MeSH

MeSH, short for Medical Subject Headings [mesh], is a thesaurus of medical terms developed by the National Library of Medicine of the United States of America. MeSH is continually being updated by the National Library of Medicine, and a new version is published each year.

The 1998 version of MeSH contains over 19,000 terms describing medical concepts. MeSH is a controlled thesaurus; that is, it specifies the preferred term for a given concept. But MeSH also lists many synonyms with cross-references to the preferred term. For example, the preferred term for Cancer is Neoplasms, so printed volumes of MeSH will include the entry:

Cancer see Neoplasms

MeSH also specifies relationships between preferred terms. One example of this is the expression of MeSH terms in a hierarchical arrangement -- the MeSH trees. All terms are members of one or more trees, relating the most general terms to the most specific.

As well as 19,000+ main terms, MeSH includes many other terms used to qualify main headings. As the name implies, these are attached to MeSH terms and specify more closely the nature of the resource described. These include subheadings, which allow concepts relating to the main term to be expressed. For example, a disease main term such as "breast cancer" may be qualified with subheadings such as "diagnosis" or "epidemiology", for articles dealing with the diagnosis of breast cancer and the number of people affected and the spread of the disease, respectively.

MeSH also contains headings used to express very basic information about subject matter, such as whether it describes human disease (as opposed to research using mice for example), age ranges, gender information and the names of countries and regions.

MeSH is the thesaurus used to create MEDLINE, the most commonly used bibliographic database in medicine. All articles included in MEDLINE are indexed with several MeSH terms. Conversely, MEDLINE may be queried using MeSH terms -- indeed, this is the most precise way to form most queries. The hierarchical nature of MeSH may be exploited during the retrieval process by allowing users to "explode" a term, i.e., allowing said users to search for all (narrower) terms below it in the tree structure.

UMLS

UMLS, the Unified Medical language System [umls] is a product of a long-term research and development project underway at the National Library of Medicine. It has been designed with the aim of aiding the integration of disparate electronic medical information sources. Some of the achievements of the UMLS project are:

creating a "super-thesaurus" of terms from many thesauri and classifications;

defining relationships between different concepts; and

identifying and describing candidate information sources.

At present, the output of the UMLS project is divided into four Knowledge Sources: the Metathesaurus, the SPECIALIST Lexicon, the Semantic Network, and the Information Sources Map. Each comprises a different element of the overall system.

The Metathesaurus contains information about terms from many controlled vocabularies used in medicine (whether for patient records or bibliographic databases). Different terms for the same concept are linked together across vocabularies, while relationships within vocabularies are preserved. The most recent version of the Metathesaurus (1998) contains over 450,000 concepts, over 1 million terms, and more than 40 source vocabularies.

The SPECIALIST Lexicon is related to the SPECIALIST natural language processing project at NLM. It contains lexical information about words and their combinations.

The Semantic Network provides a categorisation of all the concepts contained in the Metathesaurus. It also specifies relationships that may exist between semantic types.

The Information Sources Map is a database that describes information sources. The ISM holds metadata that may enable the user to assess whether a source will be able to answer a specific query, such as scope and access conditions. Applications to exploit the ISM are being developed by the National Library of Medicine, for example, allowing natural language querying of the database.

UMLS Knowledge Sources are available for use by system developers. There is no charge, but a licence agreement must be completed.

OMNI [omni] uses MeSH to index all resources added to the OMNI databases. MeSH terms appear in the keywords field of the OMNI record. The UMLS Metathesaurus is used to create links between MeSH headings and their broader or narrower neighbours, and to map queries to MeSH terms which are most likely to be helpful.

The geographical context

The geographical context of a resource discovery system actually breaks down into two distinct components:

where in the world are a large proportion of the systems users sited?

where in the world are a large proportion of the resources catalogued or indexed by the system sited?

For many resource discovery systems, we suspect that the answers would be the same, especially for those based in the USA (which has the most quality resources, as well as the largest number of users for a single country). However, systems based in countries that are bereft of many high quality biomedical resources may have different answers; most of their users are still local, but most of the resources that they catalogue are based overseas.

Is this actually a problem? Well, it could be. There are a fair number of biomedical resource discovery systems around the world, most with a different target audience in terms of geographical location or type of person (see the next section). Most of these systems are cataloguing resources from around the world; only a minority focus specifically on one country or region. Some systems, such as OMNI, partition the catalogued resources into world and UK-based resources; many do not. This leads to several problems, including:

duplication of effort. For example, seven of the main biomedical resource discovery systems have catalogued the Web site of the Centers for Disease Control and Prevention.

lack of total coverage. Since cataloguing is a relatively labour intensive process(though unbeatable in terms of quality and independence), not everything that should be catalogued is being catalogued.

Why do resource discovery systems catalogue materials based overseas, as well as that based in their own country? With many resources, it would be because no equivalent resource exists in the same country as the resource discovery system. In addition, users would rather access their faster, local, resource discovery system, instead of one or more overseas systems, in order to find relevant resources (irrelevant of where they are based).

There are various ways of reducing this duplication of effort which need to be explored (and, increasingly, are being explored). Exchanging records between two gateways could, ironically, increase the total amount of duplication across the two gateways. But it would mean that both systems are more comprehensive in coverage. However, this would only work where both gateways have similar evaluation criteria. Mirroring a gateway in another country (discussed in more detail later on) is also an option, though in a field such as biomedicine, this would mean just one more system for the end-users to contend with; more preferable than "more systems" would be "more records in, or accessible through, the existing systems".

Cross-searching, and cross-browsing, mentioned in a previous paper in D-Lib Magazine [cross], are two approaches to providing access to a greater depth of subject coverage in one search. As previously mentioned, the issue of the resource discovery systems having similar evaluation criteria, so there is consistency in the results, is a problem; in addition, the issue regarding duplicate hits across the systems being searched has to be dealt with. Not surprisingly, with the various biomedical resource discovery systems using a variety of database systems, access methods, and query resolution, building true cross-searching systems (as opposed to the meta search engine approach) would be tricky. However, the rewards are worthwhile, with the geographical problems of resource discovery reduced and the end-users having a "larger pool of catalogued resources in which to fish".

The audience context: to heal or be healed

So, who uses resource discovery systems? Well, an analysis of the usage logs for OMNI [logs] brings up some interesting results:

A significant proportion of the users who use the quick search function, i.e., do not use the knowledge aids of MeSH and UMLS, are either practitioners or other people with medical knowledge (such as medical librarians), as they are more likely to use search terms such as "neoplasms", as opposed to "cancers" or "tumours", than the general public.

"Browsing trails" show much the same effect, with evidence that a large number of users have background medical knowledge, judging by the routes (and speed) at which they navigate one or more of the browsable trees within OMNI.

Having said that, it must be stressed that there are still a large number of people who "quick search" OMNI for e.g., "cancer" or "lung cancer". While some of these searchers will be medically informed people, it is likely that a significant number will also be patients/members of the public, who are less likely to be savvy with terms such as "neoplasms" when they execute searches without the aid of MeSH and UMLS.

Looking at the access logs in terms of date and time reveals clusters of searches on notable terms. For example, in the spring of 1998, there was a sudden rise in the number of people searching on "Viagra". It is reasonable to assume that many of these people were members of the public looking for information on the drug; several search phrases indicated users more interested in where they could purchase it.

There are a significant number of searches that use words and terms almost definitely beyond the scope of OMNI. For example, someone on the day of writing this article searched OMNI for a type of ladder (unless they have heard of, e.g., someone having an accident with a ladder, and are trying to find a reference through OMNI). In addition, there are also some (thankfully not significantly) inappropriate searches by people hoping to find material of a sexual nature.

The first two points show that OMNI has a significant number of users who are practitioners, be they doctors, nurses, medical students, or medical student teachers. However, points three to five indicate that OMNI also has a user base containing non-practitioners i.e., patients and other members of the public.

This can create problems. Though members of the public could (and do) use various resource discovery systems, they are in many cases hindered by their lack of background knowledge. MeSH and UMLS can alleviate some of the problems, but not all. The user still sometimes has to have some knowledge in order to type in terms, keywords, or synonyms that can be recognised by the system. An example of a "knowledge gap" happened to an ex-general practitioner (doctor) acquaintance. One of his patients complained of blurry vision and headaches, and was convinced that he had a little-known fatal medical complaint. The doctor pressed him on how he came to this conclusion. The patient subsequently produced large reams of printouts of various medical papers he had found on the Web after searching various systems on "pain". As it transpired, the only parts of the documents he understood were the sections that described vision problems and headaches, and the word "fatal"; the rest he skipped over. Not surprisingly, he turned out not to have the fatal illness -- and his vision and headache problems were caused by staring at his computer monitor for unbroken twelve hour stretches.

How do we prevent everyday people, who do not have medical knowledge but want to acquire some, from being in this situation? Well, we cannot stop them using the Web -- and would we want to? Freedom of information, the right to know, and other reasons for being better informed (piece of mind, empowerment, the right to decide your own treatment) mean that we shouldn't be excluding people from whatever information is available -- instead, we need to make it more accessible, ensure that it is of the right quality, and that it is pitched at the right audience. To put it bluntly, practitioners get the more technical "stuff", while the public get the more lineated "stuff", otherwise known as "patient information".

Thankfully, there is already a large quantity of patient information that can be accessed over the Web. A good example is the database, Patient Education Materials [skin], provided by the American Academy of Dermatology. This is a selection of pamphlets concerning skin conditions and related diseases and illnesses. It requires no medical knowledge to read, is easy to navigate, and is accurate, unbiased and informative.

How should patient and practitioner information be treated by resource discovery systems? Different systems manage in different ways. Some systems such as Healthfinder [healthfinder] have special "topic sections" that point to major topics in which a significant number of patients or the general public would be interested, such as AIDS, Cancer, Diabetes and Medicare. Other systems allow some differentiation when browsing between patient and practitioner information. An increasing number of the actual (content-centric) biomedical resources on the Web are themselves splitting their content and access control/navigation between patient and practitioner information. For example, the Cancerhelp UK [cancerhelp] site has clearly directed sections for adults, and for practitioners, from its home page.

OMNI is in an interesting position. Our target, or core audience, is the UK biomedical academic and research communities. However, a significant minority of our users may be patients, or other members of the public. OMNI already catalogues various high quality patient information resources, as some of these resources are of use to patients and practitioners, and many of them are located through OMNI by a relatively large number of searches. However, as OMNI grows in terms of the number of resources catalogued, and as the mass of Internet users expands -- thus leading to a greater number of users who are public/patients, as opposed to practitioners (as long as access stays open to both communities) -- the issue of "profiling" the users, or channelling them towards the most appropriate area of the OMNI catalogue is an interesting and increasingly pressing one. We could, for example:

provide completely separate databases/catalogues of resources -- one for patients and one for practitioners. Users can either search one, or both combined (after being made aware of the target audience of each catalogue). However, this may be viewed by some people as providing a patronising, or "dumbed down", service for non-practitioners.

process the incoming IP address -- if it is from the .ac.uk domain or another recognised academic or medical domain, then the user is automatically redirected to a practitioner information area. This solution is the most seamless, but is not flawless; some academics are just patients, while some doctors require access to patient information themselves.

indicate the level of medical knowledge required to fully understand the contents of a resource located through the OMNI catalogue. Again, this could be seen as being patronising to patients or the general public.

lose all of the patient information resources altogether and concentrate on just core practitioner information. This would be an unpopular move with many of the practitioners who use OMNI, as they also use patient information sources for a variety of reasons.

Other options exist. However, for a resource discovery system to be widely used both within and beyond the core medical fields, it seems likely and sensible to retain some patient information scope and content. The matter of how information about patient information resources is explicitly presented to the patients is one on which we are working.

The subject context

The areas of medicine and health are, of course, only two subject areas amongst many. Many (though not all) of these other subject areas have associated resources discovery systems, in the same way that the UK biomedical area has OMNI. For example, the social sciences are covered by SOSIG [sosig], the arts by ADAM [adam], History by -- not surprisingly -- HISTORY [history], engineering by EEVL [eevl], and business and economics by Biz/ed [biz/ed]. But how many subject areas are there? And how do these areas relate to each other, or overlap?

Yahoo [yahoo], though not possessing a relatively high degree of quality control, is one of the most well-known resource discovery systems. Yahoo allows people to search or browse on the descriptions of resources. However, as opposed to most of the major search engines, such as AltaVista, the system is more overtly geared towards browsability. The central feature on the home page is a menu of a number of subject areas; each area is linked to an associated tree of resources; users generally browse the tree, progressively descending through lower and narrower subject areas until they arrive at a list of resources. What is of interest is the top few layers of the tree, which give an idea of the range and partitioning of subjects and topics covered and, thus, of primary relevance to Yahoo's audience.

Here in the UK, an initiative is underway to build a network of gateways for the academic and research communities. Such a network will accommodate learned subjects, such as biomedicine, the social sciences, engineering, and business and economics. However, it is less likely to accommodate, at high levels, some of the topics that can be found in high levels of Yahoo, such as outdoor recreation, royalty, and television. In an academic context, these topics would be placed much lower down a subject tree, probably in sub-sections for environmental studies, social sciences and media studies, respectively.

The proposed network already has a head start in many ways, as the gateways mentioned at the start of this section cover a significant part of the subject base among them, both in terms of subjects (of relevance to academics) and, increasingly, of actual resources (due to the steadily increasing amount of resources catalogued by said gateways). What is needed now are additional resource discovery systems that:

are suitable for academics

are of a high enough quality

can "fit in" the subject classification in a particularly snug manner

This last point is the one of most interest. Many combinations of subject areas overlap in some manner. For example, resources concerning "alcohol" could be of relevance to the social sciences (alcoholism), history (history of alcohol), health and medicine (damage to the liver), and even engineering (fuels) and business (exports of spirits). The subject areas of health and medicine, in particular, have many areas which overlap with traditionally partitioned subjects. The alcohol example is but one of several that spring to mind. This presents us with issues surrounding the inclusion of resources into multiple resource discovery systems, or, more interestingly, the inclusion of certain sub-sections of resources into multiple resource discovery systems.

Not surprisingly, most of the significant differences between UK Yahoo, and its USA equivalent, are those which are regional in scope; for example, UK Yahoo has several well-filled sub-sections dedicated to football in the soccer sense of the word, while the USA version has a much larger section concerned with American football. Other differences also become clear when we examine sub-sections such as "government", where we see that the largest sections pertain to the government local to that version of Yahoo e.g., USA government.

It is therefore obvious, and again not surprising, that the appropriateness and arrangement of subject trees can vary, mainly according to the two contexts previously covered in this paper:

the type of audience e.g., the general public or academics

the location of the audience

The actual number of subjects in a subject tree is a matter of debate. We can count the number of headings at the various levels. Though out of context, these can give a distorted view of the amount of actual content, as some areas of the subject tree will be associated with far more quality resources than others. As Yahoo grew, so it sprouted new sub-sections and areas. Taking into account the aforementioned issue of sub-sections of resources being relevant to more than one resource discovery system within a tree-like federation of such systems, it will be interesting to see how much the UK academic subject tree will need to expand and metamorphose, in terms of the numbers and positioning of the various headings and sections, and how this expansion is carried out with regard to the resource discovery systems through which the content is accessed.

The trinity of efficiency: mirroring, caching, and resource discovery

Mirroring and caching are two highly related techniques for increasing resource access time and reducing the use of network bandwidth. Mirroring works by taking a copy of a Web site, page, or document, and placing it in another geographical location. For example, D-Lib, the magazine that you are reading, has a true physical home site in the USA, but is also mirrored in the UK and in Australia. If you are in these countries, then accessing the version held in your (local) mirror should result in quicker access for you, as well as less strain on the international networks.

The actual process of mirroring a resource can be automated by using one of a variety of freeware or shareware packages, which can be configured to mirror a certain Web site, page, or image at a pre-determined time or day of the week.

Caching, on the other hand, involves a network-based object (such as a Web page), dropping off a copy of itself in a cache memory, through which it passed in transition from the host server to a Web browser. Other browsers that are linked to the cache will check the cache for (a dropped off copy of) a Web page which the end-user wants. If it is there (i.e., if a copy was left when someone else accessed the page), then a copy of that copy is returned to the browser. If it isn't there, then the browser will look in other caches, and work its way to the host Web server of the page until it finds a copy. The concept of caching is best explained in more pictorial terms [caching].

It is, therefore, obvious that mirroring and caching are useful mechanisms for speeding up resource acquisition (as opposed to resource discovery). In both cases, the end-user is accessing a copy of a resource that is positioned a lot closer than the actual canonical resource itself. Not only is this useful in terms of speeding up access to resources; said mechanisms also reduce the amount of network traffic, especially on links between countries.

So why isn't everything mirrored and cached, leading to quick access to everything for everyone? Well, there are some problems. In both cases, hardware resources and fast network links are required in order to provide the infrastructure. Making sure that mirrored and cached copies are the same as their associated original resource, when said resource is updated in some way, is also a problem; mirrored resources need updating on a regular basis, and caches need flushing and repopulating. However, the most serious problem is that many Web sites cannot be mirrored or cached due to their poor design, or because they have complex underlying structures and facilities, such as database systems which mirroring software often cannot handle. There are some exceptions; the SOSIG [sosig] gateway to high quality social science resources (which uses advanced ROADS [roads] technology) will be, by the time this paper is public, mirrored in the USA. The data held within the SOSIG resource catalogue is periodically and automatically sent to the USA mirror, thus maintaining parity between the UK and USA versions of SOSIG. OMNI is also exploring the possibilities, advantages, and costs involved in mirroring itself overseas.

Many biomedical resources can be mirrored in an automated manner, especially those that consist of just a collection of Web pages. We looked at 40 Web-based resources, chosen at random from the OMNI catalogue (and therefore of a decent quality). Seventeen of the resources appeared to be relatively easy to mirror; another 9 had characteristics which indicated that they could probably be mirrored with some effort, while the other 14 appeared to be difficult or impossible to mirror in an automated or semi-automated manner. An interesting observation, confirmed with further investigations of quality biomedical resources, was that a large majority of patient information Web sites could be mirrored with little human effort. This is mainly due to the more simplistic nature of the format of patient information resources, usually consisting of just a set of hierarchically arranged Web pages and a few graphics or pictures.

Here in the United Kingdom, the problem of network traffic between the UK and the USA has become so acute that a sizeable amount of funding is spent on providing extra bandwidth. As of August of this year, academic institutions are charged for usage of the transatlantic link [trans]. However, observation of the charging mechanism shows that use of the transatlantic link in certain hours of the night is free. Therefore, this is an ideal time for not only high quality biomedical resources to be mirrored, but for the national and other caches to be populated with the same resources, so as to cover as many end-user accesses as possible. OMNI is beginning to mirror selected high quality biomedical resources. By analysing the access logs [logs], we can determine which non UK-based resources are accessed more than others through OMNI; these resources will be likely candidates for mirroring.

It is, therefore, a potentially useful function to investigate mirroring remotely sited resources, not only those in the UK, but also resources located in other countries, in order to assess coverage of the systems. In addition, the effective use of caches further reduces bandwidth use and the time between resource discovery and acquisition. It would be interesting for someone to do some calculations on the likely impact, in terms of network usage and access speed to high quality resources, that could be produced by co-ordination between the three systems of mirroring, caching, and resource discovery -- a trinity of efficiency indeed?

"More primary content, Vicar?"

What do users really want? Well, resource discovery systems, such as OMNI, provide a method of quickly locating quality resources. In addition, as previously mentioned, techniques such as mirroring and caching help the end-users to quickly reach the resources themselves, i.e., they reduce the amount of time from the "discovery moment" to the "reading" or "acquisition moment". However, this all makes one large assumption: that there is actually some suitable content to access in the first place.

The assumption that "all information is on the net" is one that has grown -- especially in the public domain -- in the last few years. It has grown for several reasons: claims by large subject engines that they index many millions of Web pages; courses that offer to sell/show you ways of "finding everything you need to know" on the Web; and inaccurate reporting by sections of the media on what can be found on the Internet. This assumption is incorrect. One of the authors of this paper has an interest in big cats that live wild in the United Kingdom; there is, however, a general absence of Internet-accessible information on this subject. While a large amount of information such as sightings, theories, photographs and anecdotal evidence can be found across a number of journal, book, newspaper and specialist magazine-based material, the information available via the Internet is far less comprehensive.

Mapping this concept of incompleteness onto the health and medical fields soon leads us to two questions:

How can we determine the "comprehensiveness" of quality/relevant information on some subject?

What can be done to "fill the gaps" in content?

The first of these questions is an interesting one. Innumerable small-scale projects have been carried out by students and other people, along the lines of "see how many high quality resources there are in subject x". Over the years, several larger initiatives undertaken mostly by commercial organisations have taken a more rigorous approach to determining the "comprehensiveness" of the Web as a whole, but all the figures produced have been estimates of a very rough form, due to the near-impossible nature of the task. However, rigorously determining the number of relevant resources for a particular type of audience (e.g., medical students in some niche area of biomedicine) would be a very useful exercise. From the results, we could identify not only the "quality depth" of the niche area, but also the comprehensiveness of resource discovery systems that cover this area.

Unfortunately, determining accurate figures for subject areas is notoriously difficult. Trawling a combination of quality-catalogued resource discovery systems and search engines, such as Altavista, would be very time consuming; in addition, resources not indexed by any system would be missed. This task is, therefore, probably only feasible for from very to extremely narrowly-defined subject areas. Should any health/medical informatics researchers reading this paper have significant resources available, then they could:

take several very narrowly defined areas in the biomedical field (e.g. pick a rare or little heard of disease or illness).

construct some generic selection criteria, similar to the selection criteria of several of the better resource discovery systems.

attempt to locate as many resources as possible in the defined area by using a combination of search engines, resource discovery systems, printed indexes and other systems.

This would give a rough idea (but only a rough one, and almost definitely an underestimate) of how many resources there are in a particular niche. In addition, we could work out roughly how well various resource discovery systems (with similar selection criteria to those used in the exercise) cover the same niches.

Filling areas that are lacking in quality primary content is another matter altogether. Would this be a task for the national libraries to co-ordinate, or for library schools, biomedical research units, or the private sector? Perhaps the wider publication of surveys (such as the one outlined above) that rigorously examine the comprehensiveness of coverage in distinct areas would spur people, organisations, institutions, and companies into providing high quality content for those same areas.

References

[adam] ADAM; the Art, Design, Architecture and Media information gateway, http://www.adam.ac.uk/

[agec] The OMNI advisory group for evaluation criteria, which has been active in developing selection criteria and evaluation, especially in the area of biomedical informatics, http://omni.ac.uk/agec/agec.html

[biz/ed] Biz/ed, a dedicated business and economics information gateway for students, teachers and lecturers, http://www.bized.ac.uk/

[caching] Cashing in on Caching, John Knight and Martin Hamilton. This article, despite being over two years old, explains in clear detail how caching works, http://www.ariadne.ac.uk/issue4/caching/intro.html

[cancerhelp UK] The CancerHelp UK site directs people to either patient (adult) or practitioner tailored information, http://medweb.bham.ac.uk/cancerhelp/indexg.html

[core] The Dublin Core resources page maintained by Andy Powell, http://www.ukoln.ac.uk/metadata/resources/dc.html

[cross] Cross-searching Subject Gateways: The Query Routing and Forward Knowledge Approach; John Kirriemuir, Dan Brickley, Sue Welsh, Jon Knight, Martin Hamilton. The paper discussed cross-searching and cross-browsing subject-based information gateways that covered a number of subject areas, http://www.dlib.org/dlib/january98/01kirriemuir.html

[eevl] EEVL, the Edinburgh Engineering Virtual Library, is a gateway (with a large number of value-added services) to high quality engineering resources, http://www.eevl.ac.uk/

[healthfinder] Healthfinder is a resource discovery system that leans more towards patient information than most other biomedical resource discovery systems. The home page links to a wide variety of services; conditions that affect large numbers of people (and are therefore of interest to a large proportion of end users), such as AIDS, cancer and diabetes, are linked directly from the home page, http://www.healthfinder.org/default.htm

[history] HISTORY, A gateway to quality resources in the wider field of historical studies, http://ihr.sas.ac.uk/

[logs] It should be stressed that OMNI takes a responsible and ethical approach to the use of its access log files. Individual IP addresses are never looked at or reproduced. Publically released information is only general in nature, showing accesses from a particular domain (e.g. the USA), or network (e.g., JANET [.ac.uk]). Our log files are never released to any organisation outside of OMNI, and use of, and access to, them within OMNI is strictly controlled.

[medweb] MedWeb, a biomedical gateway that has a geographically oriented browse facility, enabling users to "home in" on resources associated with some geographic area, http://www.gen.emory.edu/MEDWEB/medweb.html

[medfinder] Medfinder, a biomedical gateway with an associated emphasis on images, http://www.netmedicine.com/medfinder.htm

[megasites] Comparison of Health Information Megasites, P. F. Anderson, Nancy Allee, Jean Chung, Brian Westra, Virginia Lingle. A comparison of 25 health and medical resource discovery systems, http://henry.ugl.lib.umich.edu/megasite/toc.html

[mesh] The National Library of Medicine guide to MeSH, http://www.nlm.nih.gov/mesh/meshhome.html

[omni] Organising Medical Networked Information. A gateway to quality resources in the biomedical fields. The gateway is primarily targetted at the UK academic and research communities, though it is currently freely available for all to access, http://omni.ac.uk/

[rdf] The World Wide Web Consortium RDF (Resource Discovery Framework) Web site, http://www.w3.org/Metadata/RDF/

[roads] The ROADS (Resource Organisation And Discovery in Subject-based services) Web information service is shared amongst the three project partners across the interconnected ROADS Web sites. ROADS is in use by a large community of high-profile, quality subject-based gateways spread across several countries; these gateways cover such topics as health, social science, electronic texts, maritime information and sociology and anarchy. The ROADS software can be picked up from the Loughborough http://www.roads.lut.ac.uk/ partner site, which also contains pointers to related mailing lists. The UKOLN ROADS site http://www.ukoln.ac.uk/roads/ contains the ROADS template registry and various ROADS software tools, such as data format conversion scripts. The ILRT ROADS site http://www.ilrt.bris.ac.uk/roads/ contains various background information about the project, a list of those gateways that use ROADS, and various guides to the ROADS software and project.

[skin] Patient Education Materials from the American Academy of Dermatology. A collection of illustrated Web pages, each one describing a particular skin-affecting complaint or disease, http://www.aad.org/aadpamphrework/pampleti.html

[sosig] SOSIG, the Social Science Information Gateway, is one of the most established subject gateways/resource discovery systems available over the Web, http://www.sosig.ac.uk/

[trans] The circular on "Usage-related Charges for the JANET Network", which details the charges on academic institutions for use of the main UK academic network, http://www.jisc.ac.uk/pub98/c3_98.html

[yahoo] Yahoo, a browse-oriented gateway http://www.yahoo.com/

[umls] The National Library of Medicine guide to UMLS, http://www.nlm.nih.gov/research/umls/umlsmain.html

Acknowledgements

The authors acknowledge the ideas, thoughts and utterances of a large number of people, which have been subsumed into this paper. The authors especially wish to thank (in alphabetical order) Betsy Anagnostelis, Daniel Brickley, Dave Cook, Lorcan Dempsey, Lisa Gray, Nicky Ferguson, Debra Hiom, Bruce Madge, Frank Norman, Bob Parkinson, Frances Singfield, Philip Tuck (BT), Norman Wiseman and Emma Worsfold.

© 1998 John Kirriemuir and Sue Welsh

Top | Magazine
Search | Author Index | Title Index | Monthly Issues
Previous Story | Next Story
Comments | E-mail the Editor

hdl:cnri.dlib/september98-kirriemuir

D-Lib MagazineSeptember 1998

ISSN 1082-9873

Biomedical Resource Discovery in Context

Introduction

MeSH and UMLS in context

The geographical context

The audience context: to heal or be healed

The subject context

The trinity of efficiency: mirroring, caching, and resource discovery

"More primary content, Vicar?"

References

Acknowledgements

© 1998 John Kirriemuir and Sue Welsh

D-Lib Magazine
September 1998