Watson Technical Innovations Murthy Devarakonda, Ph.D. Watson Innovations IBM Research and Watson Group
With Precision, Accurate Confidence and Speed, the rest was History
Watson answers by finding, reading, scoring and combining evidence IN 1698, THIS COMET DISCOVERER TOOK A SHIP CALLED THE PARAMOUR PINK ON THE FIRST PURELY SCIENTIFIC SEA VOYAGE Question Analysis Important Terms: 1698, comet, paramour, pink, AnswerTypes: comet discoverer Date(1698), Took(discoverer, ship) Called(ship, Paramour Pink) Content (Structured & Unstructured) Primary Search Isaac Newton Wilhelm Tempel HMS Paramour Christiaan Huygens Halley s Comet Edmond Halley Pink Panther Peter Sellers Candidate Answer/Hypothesis Generation High-Speed Evidence Retrieval Term Overlap Classification Relations [0.58 0.5-1.3 0.97] [0.71 1 13.4 0.60] [0.42 0 2.0 0.90] [0.84 0.5 10.6 0.88] [0.33 0 6.3 0.83] [0.21 1 11.1 0.92] [0.91 0-8.2 0.31] [0.91 0-1.7 -.20] Diverse and Extensible Evidence Scoring Temporal 100 s of NLP Scoring Algorithms 1) Edmond Halley (0.85) 2) Christiaan Huygens (0.20) 3) Peter Sellers (0.05) 4) Merging & Ranking Based on Statistical Machine Learning
DeepQA: The architecture underlying Inside Watson Using a highly parallelized architecture generates many hypotheses, collects a wide range of evidence and balances the combined confidences of over 100 different analytics that analyze the evidence form different dimensions Question Question & Topic Analysis Primary Search Multiple Interpretations Answer Sources 100 s sources Question Decomposition Candidate Answer Generation 100 s Possible Answers Hypothesis Generation Answer Scoring 1000 s of Pieces of Evidence Evidenc e Sources Evidence Retrieval Hypothesis and Evidence Scoring Deep Evidence 100,000 s Scores from Scoring many Deep Analysis Algorithms Synthesis Learned Models help combine and weigh the Evidence Balance & Combine Models Models Models Models Models Models Final Confidence Merging & Ranking Hypothesis Generation... Hypothesis and Evidence Scoring Answer & Confidence
Taking Watson beyond Jeopardy! (Medical Domain) Understanding Interacting Explaining Learning Specific Questions Question-In/Answer-Out Precise Answers & Accurate Confidences Batch Training Process The type of murmur associated with this condition is harsh, systolic, and increases in intensity with Valsalva From specific questions to rich, incomplete problem scenarios (e.g. EHR) Evidence analysis and look-ahead, drive interactive dialog to refine answers and evidence Move from quality answers to quality answers and evidence Scale domain learning and adaptation rate and efficiency Input, Responses Answers, Corrections, Judgements Entire Medical Record Dialog Refined Answers, Follow-up Questions Responses, Learning Questions Rich Problem Scenarios Interactive Dialog Teach Watson Comparative Evidence Profiles Continuous Training & Learning Process 5
Adapting Watson to Medicine: An Example of the Adaptation Algorithm Adaptation Automatic Training Nov 12: 63.3% May 12: 53.7% Nov 11: 52.1%* Content Update Original Jeopardy! Watson May 11: 33.0%* J! : 21.3% May 11: 45.2%* * 2011 results are inflated by an estimated 6% due to use of ACP PIER, from which DD questions are often derived
NEJM Medical Concept Annotations Diseases Symptoms Medications Modifiers We use UMLS (Unified Medical Language System) CUIs (Concept Unique Ids) but the vocabulary is noisy.
Providing Insights from Electronic Medical Records Understanding Interacting Explaining Learning Specific Questions Question-In/Answer-Out Precise Answers & Accurate Confidences Batch Training Process The type of murmur associated with this condition is harsh, systolic, and increases in intensity with Valsalva From specific questions to rich, incomplete problem scenarios (e.g. EHR) Evidence analysis and look-ahead, drive interactive dialog to refine answers and evidence Move from quality answers to quality answers and evidence Scale domain learning and adaptation rate and efficiency Input, Responses Answers, Corrections, Judgements Entire Medical Record Dialog Refined Answers, Follow-up Questions Responses, Learning Questions Rich Problem Scenarios Interactive Dialog Teach Watson Comparative Evidence Profiles Continuous Training & Learning Process 8
An EMR contains plain text & semi-structured data Encounter (Clinical) Notes An EMR (up to 50MBs for a patient) Medications Laboratory
Medical Concepts Identification disease or syndrome CUI = C0011849 sign or symptom CUI = C0041834
Watson Electronic Medical Record Analysis Cognitive Challenges of a Physician Our goal is to significantly enhance a clinician s cognitive process in patient care
Patient EMR Summarization Text summarization is a well known problem in Artificial Intelligence The basic idea of summarization is to take a body of information and reduce its size and content to its important points Important properties of a summary are: reduce the workload for the interpreter/understander (simpler) maintain coherence maintain coverage include important events of the story Alterman, Richard. "Understanding and summarization." Artificial Intelligence Review 5.4 (1991): 239-254. Clinical summarization takes it to a new dimension of complexity Content: EMR text is non-text Interpretation: Use of vast medical knowledge Purpose: Help physician in patient care
Identifying Patient s Medical Problems EMR Candidate Generation Feature Generation Score Scoring / Weighting Grouping Notes Clinical Factors Extraction CUIs of unique Disorders (O(100)) Information Extraction Text Segmentation CUI Confidence Term Frequency Note Section 1.0 0 0.4 Score 1.0 0 10 Score Confidence Term Frequency 1. 0 0 PMH Note Section Score Graph of grouped Problems Medicines LSA CUI Path 1.0 0 0.3 LSA Score Score 1.0 Structured Data (Medications, Orders, Lab, etc) CUIs of unique Medications, Orders, Lab, etc. Relationship Lab Tests LSA CUI Path 0 A may treat B Path Pattern Candidate Problems (O(10)) 13
Problem List Accuracy Definitions: Recall (sensitivity) = predicted / actual-problems Precision = correct / predicted Specificity = discarded / non-problems As of Mar 2014: Recall upper bound is 90% (the remaining 10% problems do not appear in the Gold Standard EMRs) Recall (sensitivity) = 69.5% Precision (PPV) = 45.2% TNR (specificity) = 95.5% F1 = 0.545, F2 = 0.628 What about the entered problems in EMRs? Retrieved Problem list Unique Diagnoses Precision = 50% Precision = 40.8% Recall = 15.7% Recall = 19.6%
Problem-Oriented Patient Record Summary Medications Procedures Generated Problems List treated by measured by Lab tests Vitals discussed in EMRA determines these relationships & the grouping of their listing Clinical Notes & timeline also, allergies, social history, and demography
As the Physician selects Diabetes Mellitus, screen changes to show related active medications and labs When a problem is selected Current and related related labs & meds are meds are highlighted shown Related clinical notes are highlighted Labs show elevated glucose and A1C among the others
Continuous Training & Learning Through Inference Chaining Understanding Interacting Explaining Learning Specific Questions Question-In/Answer-Out Precise Answers & Accurate Confidences Batch Training Process The type of murmur associated with this condition is harsh, systolic, and increases in intensity with Valsalva From specific questions to rich, incomplete problem scenarios (e.g. EHR) Evidence analysis and look-ahead, drive interactive dialog to refine answers and evidence Move from quality answers to quality answers and evidence Scale domain learning and adaptation rate and efficiency Input, Responses Answers, Corrections, Judgements Entire Medical Record Dialog Refined Answers, Follow-up Questions Responses, Learning Questions Rich Problem Scenarios Interactive Dialog Teach Watson Comparative Evidence Profiles Continuous Training & Learning Process 17
Missing Links Buttons Category: Common Bonds Shirts, TV remote controls, Telephones Mt Everest Edmund Hillary He was first On hearing of the discovery of George Mallory's body, he told reporters he still thinks he was first.
Clinical Reasoning - WatsonPaths Start with a scenario description. Extract statements. Generate questions. Ask Watson for answers. Use statements, questions and answers to generate assertions (opinions about relations with confidence). Propagate belief from what we know (the scenario) to what we re considering (our hypotheses). A 63-year-old patient is sent to the neurologist with resting tremor. patient exhibits resting tremor patient has Parkinson s Disease Scenario. Scenario Analysis. Extracted statement. Q: Watson, what causes resting tremor? A. Parkinson s Disease (62%) Inferred statement. Page 19
(Simplified) Example Page 20
1 Scenario Analysis Scenario 2 Candidate Generation 3 Statement Prioritization 4 Question Generation 5 WatsonPaths Proceed through the numbered steps. Repeat steps 2 through 8 until complete. Completion may be determined differently. One example is to stop after the factors from scenario analysis are connected to the candidates. Page 21 Watson QA Watson QA 6 Process Results to Create New Assertions 7 Find and Evaluate Inferences... Watson QA 8
Inference Graphs A 63-year-old patient is sent to the neurologist with... resting tremor... What part of his nervous system is most likely affected? - USMLE Question Scenario Description states patient exhibits resting tremor More inferences. indicates patient has Parkinson s Disease indicates patient s Substantia Nigra is affected A node represents a statement that can be judged true or false. Types of statements are input factors (gray), inferred factors (white) and hypotheses or answers (green). Border strength visually represents Watson s belief that a factor is true. Input Factor An edge represents a relation between the connected statements. Agents make assertions about their confidence in a relation. Belief propagation uses these confidences in deriving belief in relations for the given scenario. Visual edge strength represents both agent confidence and the IBM Confidential propagated belief. Inferred Factor Inference Graph Hypothesis A platform for integrating assertions about a scenario whether from Watson QA, other structured and semi-structured knowledge sources or from the user themselves. A platform for orchestrating the decision process. Page 22
The (Less Simplified) Result Page 23
Enhanced User Interface for WatsonPaths Page 24
Dialoguing to an answer in Clinical Decision Support Present Factors Red, painful eye Blurred vision Family history of arthritis Q: What diagnosis explains the patient s condition? Medical Record Absent Factors Circular rash Fatigue Headache The first symptom of Lyme disease (also called Lyme s disease) Lyme disease for about can 50% affect of people different is body a small, systems, red bull s-eye such rash, the called nervous erythema system, migrans, joints, skin, at the and site of an heart. infected Symptoms tick bite. are often described as Other early, happening acute Lyme in three symptoms stages (although are flu-like not everyone fatigue, achy experiences muscles all or three): joints, fever, chills, stiff neck, swollen 1.A circular glands, rash, and typically a headache. within 1-2 weeks of infection, often is the first sign of infection. Lyme disease 2.Along is with is caused by the rash, the a bacterium person Borrelia may have flu-like burgdorferi and symptoms is is transmitted such to as swollen to humans through lymph nodes, the fatigue, bite of of infected headache, blacklegged ticks. and muscle Typical aches. symptoms include high temperature, headache, fatigue, and a characteristic skin rash called erythema migrans.
Thank You