Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

被引:35
|
作者
Bader, Judith L. [1 ]
Theofanos, Mary Frances [1 ]
机构
[1] NCI, Off Commun Canc Informat Prod & Serv, Commun Technol Branch, Bethesda, MD 20852 USA
关键词
Cancer; Internet; search engines; natural language processing;
D O I
10.2196/jmir.5.4.e31
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer. gov, arrive via links in search engine result. Objective: To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. Methods: The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared >= 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. Results: Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. Conclusions: Natural-language searching affords users the opportunity to fully express their information needs and can aid users naive to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Understanding Search Queries in Natural Language
    Neverilova, Zuzana
    Kvassay, Matej
    RASLAN 2018: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2018, : 85 - 93
  • [2] Searching a Video Database using Natural Language Queries
    Shubha, M.
    Kapoor, Kritika
    Shrutiya, M.
    Mamatha, H. R.
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 190 - 196
  • [3] A Movie Search System with Natural Language Queries
    Wang, Xin
    Zhan, Huayi
    Yang, Lan
    Li, Zonghai
    Zhong, Jiying
    Zhao, Liang
    Sun, Rui
    Tan, Bin
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2018), PT II, 2018, 10828 : 791 - 796
  • [4] Searching for music using natural language queries and relevance feedback
    Knees, Peter
    Widmer, Gerhard
    ADAPTIVE MULTIMEDIAL RETRIEVAL: RETRIEVAL, USER, AND SEMANTICS, 2008, 4918 : 109 - 121
  • [5] Translating Web Search Queries into Natural Language Questions
    Kumar, Adarsh
    Dandapat, Sandipan
    Chordia, Sushil
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 944 - 947
  • [6] Mathematical Formula Search using Natural Language Queries
    Yang, Seon
    Ko, Youngjoong
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2014, 14 (04) : 99 - 104
  • [7] Are the Code Snippets What We Are Searching for? A Benchmark and an Empirical Study on Code Search with Natural-Language Queries
    Yan, Shuhan
    Yu, Hang
    Chen, Yuting
    Shen, Beijun
    Jiang, Lingxiao
    PROCEEDINGS OF THE 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER '20), 2020, : 344 - 354
  • [8] Mapping Natural Language Questions to SPARQL Queries for Job Search
    Karim, Naila
    Latif, Khalid
    Ahmed, Nabeel
    Fatima, Mishall
    Mumtaz, Atif
    2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 150 - 153
  • [9] Searching for information on the Internet using the UMLS and Medical World Search
    Suarez, HH
    Hao, XL
    Chang, IF
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1997, : 824 - 828
  • [10] The ambiguity of negation in natural language queries to information retrieval systems
    McQuire, AR
    Eastman, CM
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1998, 49 (08): : 686 - 692