I attended at two day NISO meeting regarding the development of standards for metasearching. In my opinion, the meeting was very successful resulting in a number of key recommendations. It is clear that additional standards will be required in order to complete projects like Scholars Portal. This report is a description of the meeting.
The morning of the first day began with presentations from a variety of people representing the key stakeholders in this area. The presentations are available on the NISO page. The following are some highlights.
Brenda Bailey-Hainer talked about the state/public library perspective. The slides from her presentation are very close to the points that she made. She talked about the broad customer base that she needed to serve and the variety of services that needed to be integrated into a Portal. There were no big revelations here - just confirmation of what we are learning in the Scholars Portal project.
George Machovec talked about the academic library perspective. He is from the Colorado Alliance of Research Libraries. He talked almost completely from his slides so I won't repeat the content here.
Jenny Walker from ExLibris talked about her perspective as a library systems vendor (more than as a metasearch provider). She raised some interesting questions regarding trends and standard approaches. What is the role of the traditional library catalog? What is the balance between Union Catalog and federated searching? She sees many libraries moving forward with RFP for metasearching but with metasearching in combination with other services like ILL, Electronic Reserves and local content management. She mentioned that the University of Amsterdam is using Metalib as the primary access point to the library. She also mentioned that AARLIN (our Australian counterpart - that moved from FD to ExLibris) is focusing on researchers. (The director from Iowa said that the Scholars Portal project is focusing on Undergrads - I commented that each library is selecting its own focus).
Peter Noerr from Muse Global talked about issues from the Metasearch system provider. Again, he talked from his slides and seemed to reserve any insight that he has gained - it seemed like an advertisement more than a presentation.
Ed Moura from Gale talked about the content aggregators' perspective. His presentation followed his slides. He feels that metasearch should be part of the ILS and wonders why another level of system architecture would be required. Somebody asked if the publishers planned for access via these metasearchers. He said that they do plan for standard access but they are currently having problems because of the increased use of their resources from portals. He said that they have a page that provides information about standard access. A discussion ensued about the increased access to databases because of metasearch tools...one single user that doesn't know what they are looking for begins acting like 50 users. For example somebody looking for flea collars in psych info, eric, etc. blocks access to the system for others - like a neurosurgeon trying to put in a grant application. Screen scraping seems to be more 'expensive' for vendors. Searches have gone up 10 fold but retrieval hasn't gone up in a commensurate rate. He mentioned that you can get a rough estimate of the impact on indexes when you consider how many resources are in each profile. So, for example, if you have 10 resources in a profile, you can estimate that the index provider will see a 10 fold increase in searching. A single search actually produces 10 separate searches.
Marc Krellenstein talked from the publishers' perspective. Admitted that they were, at best, ambivalent about this technology. They have branding issues and, in addition, consider themselves as a primary search provider. They do have a standard interface that is Xquery-based - rather than based on library standards. They would be happy to make their 'standards' available to this group for adoption. They are moving to an XML repository approach to all of their data. There are details related to this on his slides. They don't want to expose their information in a way that is not optimal. They provide primary search as well as content and they see the entire package as being important. They want to be sure that there is appropriate branding, resolution of duplicates (will the appropriate content be delivered - or more essential, perhaps, will their content be delivered). Questions from the audience: ExLibris asked if customers are asking for this...and also if they were limiting this type of access in their licenses. They have had only limited interest raised to them for this functionality from the library community and they do limit access based on their contract. The slides reflect that Elsevier feels that it is important that they be able to deliver the information that is most relevant. Somebody mentioned that it was odd that the publisher would be able to determine relevance - that seemed like a function for the customer. Questioning where relevancy should be determined - in the search or the results.
The individual groups met for 5 _ hours over the course of two days. My initial group was MetaSearch ID that completed within 2 hours. I then joined the Result Set group.
Groups reported out intermediate results. See Appendix A. Overall it seemed like groups spent a great deal of time identifying scope and issues. Only the Metasearch ID group reached a recommendation at this point.
Statistics:
Recommendations:
The core of the problem is entity (library, university) has authenticated the person, making sure the rest of the world knows that this person is certified to use appropriate resources. We also need to know that the entity (library, university) can be trusted to authenticate. The metasearch does not have to be involved in authentication or authorization but just needs to be able to pass through the certification.
CNI did a white paper a couple of years ago describing the current state of authentication, authorization and certification.
In simple words, the recommendation is to understand the problem and see what existing standards and practices will solve the problem.
Recommendations:
Two methods of identifying the metasearch engine to the target:
Will create a guideline and put it on the NISO website for approval.
Goal:
Open exchange of:
Requires:
Actions needed:
Other work:
Recommendations:
Searching: Short-term recommendations:
Information exchange between Metasearch vendor and content vendor
Information exchange between metasearch vendor and content vendor
Best practices
Long-term Recommendations: Standards work:
We don't want to deprecate Z39.50.
Context:
General recommendations:
May need more informal short-term process to define metadata need not necessarily tied to a specific protocol
Short-term:
Result Set Long-term:
Single Record Short-term:
Single-Record Long-term
Open Issues
Discussion:
Seems like metasearch id, search and result set are very closely intertwined. Wondering if it is possible to get together and write short-term best practices.
Four major working areas that result from the recommendation (according to Pat Stevens):
NISO is a member driven organization. Activities happen because people are willing to participate in them. Organizations that have large staffs have higher membership fees. They asked if people would be willing to work in the following areas:
The planning committee will be meeting be meeting to turn this into a formal recommendation.
Access Management:
Like an onion. Everytime we feel like we are beginning to understand the issues we come up with something that complicates the issues.
Problem: Understand the roles and responsibilities of the actors in the meta-search delivery continuum. Many differing functions, there are no clear path to a solution, beginning to think that not every function may be conducive to a standard.
Definitions:
o Authentication - validation of the user credentials.
o Certification - communicates results of previous authentication (authentication, authenticator, organization, attributes). May be accomplished with digital certificates.
o Validation - confirmation that the entity has the right to use the service.
Assumptions.
Originating organization is responsible for authentication. An authenticated user is certified by the library to access remote services including meta-search systems. The certification is passed through intermediaries (meta-search systems) to the target. The intermediary may make use of the certification attributes. One sign-on provides for certification to many data service providers to execute federated search.
Used for...
o Purchasing decision
o Measuring effectiveness and efficiency (quality of service, relevance)
o Gauge value of library to funding agencies
Effects on stats of adding metasearch to information environment: Does this make us non-compliant? One user can do multiple searches with multiple sessions. Or metasearch engine can create a session and hold it allowing many users to use it. How the metasearch engine is tweaked will create difference in search activity and statistics.
What do libraries really want us to measure? Demographics, comparative data, outcomes Support analyzing use of product. Database selected, searched, used, hits, etc. How many targets selected (automatic vs. manual) Support setting of relevance of targets. Search time? Track quality of services? Content providers may want some metrics provided to them.
Problem: Resource providers doing a lot of work to build screens that are immediately discarded by the metasearch providers. Increased demand based on having multiple search requests from metasearch engines.
Can we set up a mechanism that will allow a metasearch engine to identify itself?
Set up a special address for metasearch. Use special parameters.
Also developed a diagram. User is using a metasearch engine as an agent. Metasearch agent uses a collection description service that has a bunch of records about collections and how they are accessed. That gives the metasearch agent some information about what collections to present to the user. It might also be used to suggest additional searches when the user has sub-standard results. They have a pilot underway in the UK to build a repository of these kinds of collection descriptions.