DYNAMIC REFERENCE SIFTING: A CASE STUDY IN THE HOMEPAGE DOMAIN
Jonathan Shakes, Marc Langheinrich & Oren Etzioni
Department of Computer Science and Engineering
University of Washington
Seattle, Washington 98195-2350, USA
{jshakes|marclang|etzioni}@cs.washington.edu
(in Proceedings of the Sixth International World Wide Web Conference, pp.189-200, 1997)
How is the indexing performed?
A: Indexing is the process of creating a Conceptual Fingerprint from a text. In Collexis, this automated indexing mechanism performs the following steps on the text: removing the stop words, normalizing the text, selecting concepts by comparison with the thesaurus, clustering the concepts and attaching a relative weight to the concepts by means of a set of algorithms and measuring the specificity, similarity and frequency of the concepts.
Back to Top
Q: How does Collexis generate its search results?
A: Collexis employs vector matching: comparing a search query with the Fingerprints from the records in a Collexion. The outcome is a very accurate and relevant list of content items and/or experts in the form of a list of records. There also exists the possibility of over-specifying a query (i.e., using a considerable piece of text), thus adding context to the query. This context will help the system to improve the accuracy of the query and return references to those content items that are contextually related. The system administrator can enlarge or reduce the set of returned documents by entering a threshold that indicates the minimum “distance” between the records returned and the query. Matching of a search query with Collexion records can be performed on multiple Collexions at the same time.
Back to Top
Q: What makes Collexis different?
A: Initially, Collexis differentiates itself from full-text search engines by making use of thesauri for information retrieval. The high-quality search is based on semantics that have been defined in a thesaurus or ontology: synonymous terms and terms in different languages are linked to a single concept. Hierarchical relations between concepts, links between definitions and terms, and other semantic relationships are utilized in the search applications. This process helps to highlight those terms most relevant to the searcher’s query.
STATISTICAL STRATEGIES ENVIRONMENTAL EPIDEMIOLOGY
report is in. three parts, general problems in environmental epidemiology,. prototypical. problems, and statistical strategies. Emphasis ...
projecteuclid.org/DPubS/Repository/1.0/Disseminate?handle=euclid.bsmsp/1200514698&view=body&content-type=pdf_1 - Similar pages -
by JR GOLDSMITH - Related articles - All 2 versions
Entrez Programming Utilities are tools that provide access to Entrez data outside of the regular web query interface and may be helpful for retrieving search results for future use in another environment.
Google Librarian Central - Article 12/2006 - 3
Download a PDF of this article
When I interned at Google last summer after getting my MSI degree, I worked on projects for the Book Search and Google Scholar teams. I didn’t know it at the time, but in completing my research over the course of the summer, I would become the resident expert on how universities were approaching Google Scholar as a research tool and how they were implementing Scholar on their library websites. Now working at an academic library, I seized a recent opportunity to sit down with Anurag Acharya, Google Scholar’s founding engineer, to delve a little deeper into how Scholar features are developed and prioritized, what Scholar’s scope and aims are, and where the product is headed.
-Tracey Hughes, GIS Coordinator, Social Sciences & Humanities Library, University of California San Diego
SumoBrain is FREE! SumoBrain offers cross-collection searching, portfolios, alerts, and other collaboration tools, as well as bulk PDF download capabilities. Sumobrain caters to intellectual property professionals, attorneys, and users in the corporate world. While SumoBrain was conceived as a subscription service, we have decided to take the radical step of making SumoBrain completely free. As long as we can support its costs without subscription fees, it will remain free indefinitely
Principles of categorized search result visualization
We are developing a set of search result visualization principles, based on the premise that consistent, comprehensible visual displays built on meaningful and stable classifications will better support user understanding of search results.
1. Provide overviews of large sets of results (100-1000+)
2. Organize overviews around meaningful categories
3. Clarify and visualize category structure
4. Tightly couple category labels to result list
5. Ensure that the full category information is available
6. Support multiple types of categories and visual presentations
7. Use separate facets for each type of category
8. Arrange text for scanning/skimming
9. Visually encode quantitative attributes on a stable visual structure
The FacetedDBLP search interface allows to search computer science publications in the DBLP collection starting from some keyword and shows the result set along with a set of facets, e.g., distinguishing publication years, authors, or conferences. It is the first large scale application that uses GrowBag graphs to create a computer science specific topic facet, with which a user can characterize the result set in terms of main research topics and filter it according to certain subtopics.
FacetedDBLP builds upon the DBLP++ data set which is an enhancement of DBLP (as of 2008-11-21) plus additional keywords and abstracts as available on public web pages. We have also corrected some of the links to electronic editions, which were broken in DBLP. A brief description of the GrowBag facet within FacetedDBLP can be found in our JCDL paper, a detailed description of the algorithm is available on the GrowBag project page.
We offer a quick way to compare the prices of any in-print and many out-of-print books at over a dozen online bookstores. You can view the results with or without the shipping costs of a single book, and also find the fastest source for a book from ordering to delivery.
Ilial, Inc. is a Los Angeles based Internet startup backed by blue chip Northern and Southern California Venture Capitalists. We have developed technologies that will redefine the $20B online advertising industry. After several years of stealth development, the company is poised to launch its service. Additional information regarding our privacy policy can be found on the following page Privacy Policy.
Search Engines
Individual Search Engines | Meta Search Engines
Search Engine Collections
Visit How to Choose a Search Engine or Directory, a more extensive list of search tools organized by features.
To keep up with the world of new search engines, visit the blog Alt Search Engines.
Y. Tan, M. Kan, and D. Lee. JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, page 314--315. New York, NY, USA, ACM, (2006)