Shining A Light On Consumer Feedback. Luminoso In Action. Case Study

Similar documents
BULATS A2 WORDLIST 2

ScienceDirect. Malayalam question answering system

Cross Language Information Retrieval

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

What the National Curriculum requires in reading at Y5 and Y6

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Constructing Parallel Corpus from Movie Subtitles

National Literacy and Numeracy Framework for years 3/4

Developing Grammar in Context

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Sample Goals and Benchmarks

Guidelines for Writing an Internship Report

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Coast Academies Writing Framework Step 4. 1 of 7

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Teaching Vocabulary Summary. Erin Cathey. Middle Tennessee State University

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Course Outline for Honors Spanish II Mrs. Sharon Koller

Linking Task: Identifying authors and book titles in verbose queries

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

California Department of Education English Language Development Standards for Grade 8

EUROPEAN DAY OF LANGUAGES

The taming of the data:

Loughton School s curriculum evening. 28 th February 2017

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

BASIC ENGLISH. Book GRAMMAR

Modeling full form lexica for Arabic

Memory-based grammatical error correction

Emmaus Lutheran School English Language Arts Curriculum

Information Retrieval

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Writing a composition

Advanced Grammar in Use

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Adjectives tell you more about a noun (for example: the red dress ).

Copyright 2017 DataWORKS Educational Research. All rights reserved.

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Speech Recognition at ICSI: Broadcast News and beyond

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Literature and the Language Arts Experiencing Literature

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Virtually Anywhere Episodes 1 and 2. Teacher s Notes

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Using a Native Language Reference Grammar as a Language Learning Tool

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Words come in categories

CHAPTER 5. THE SIMPLE PAST

LET S COMPARE ADVERBS OF DEGREE

A Pumpkin Grows. Written by Linda D. Bullock and illustrated by Debby Fisher

Big Fish. Big Fish The Book. Big Fish. The Shooting Script. The Movie

Part I. Figuring out how English works

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

First Grade Curriculum Highlights: In alignment with the Common Core Standards

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

The College Board Redesigned SAT Grade 12

A process by any other name

Creating Travel Advice

Chapter 5: Language. Over 6,900 different languages worldwide

Transcript for French Revision Form 5 ( ER verbs, Time and School Subjects) le français

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Mercer County Schools

Worldwide Online Training for Coaches: the CTI Success Story

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Author: Fatima Lemtouni, Wayzata High School, Wayzata, MN

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Pronunciation: Student self-assessment: Based on the Standards, Topics and Key Concepts and Structures listed here, students should ask themselves...

Corpus Linguistics (L615)

ROSETTA STONE PRODUCT OVERVIEW

HOW TO STUDY A FOREIGN LANGUAGE MENDY COLBERT

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Formulaic Language and Fluency: ESL Teaching Applications

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Multilingual Sentiment and Subjectivity Analysis

ELP in whole-school use. Case study Norway. Anita Nyberg

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

A Bayesian Learning Approach to Concept-Based Document Classification

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

A Syllable Based Word Recognition Model for Korean Noun Extraction

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Postprint.

The Consistent Positive Direction Pinnacle Certification Course

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

Detecting English-French Cognates Using Orthographic Edit Distance

Fluency YES. an important idea! F.009 Phrases. Objective The student will gain speed and accuracy in reading phrases.

rat tail Overview: Suggestions for using the Macmillan Dictionary BuzzWord article on rat tail and the associated worksheet.

Derivational and Inflectional Morphemes in Pak-Pak Language

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Transcription:

Case Study Use Case: Customer Analytics Segment: Voice Of The Customer Shining A Light On Consumer Feedback Spun out of the MIT Media Lab in 2010, Luminoso quickly drew the attention of major consumer brands. Its flagship product, Luminoso Analytics, digests large volumes of text-based customer feedback, such as online reviews, surveys, and customer service interactions. Instead of just finding key words, Luminoso identifies key concepts, ideas, thoughts, and sentiments that drive consumer choices. Luminoso, with multilingual capabilities, empowers its clients to understand, measure, and act on consumer feedback across any number of channels. Products: Rosette Surprisingly, the color of the item was mentioned in comments as a significant factor in the product experience. This key feature would not have been discovered through guided queries alone. Functions: Text Analytics Luminoso In Action A multinational consumer goods company was designing a new variant of its most popular men s personal care product. The million dollar question was, what design features actually matter most to their consumers? Availability: API or SDK Thousands of testers filled out a survey about the new product variant, which asked them to rate attributes on a scale of 1 to 5, and to leave free-form comments. These comments were more text than could be accurately processed manually The company chose Luminoso to analyze the free-form comments, uncovering distinct product feature discussions, and then measured each one for positive and negative feedback. Luminoso also looked at reviewer rating tiers separately to see what company and product attributes drove both high and low review scores. Surprisingly, the color of the item was mentioned in comments as a significant factor in the product experience. This key feature would not have been discovered through guided queries alone. 1

The Challenge Every time Luminoso adds a new human language to their portfolio, finding a reliable linguistic analyzer in that language is step one. For Luminoso s algorithms to tease out ideas, thoughts, and sentiment from unstructured text, the raw text needs to be enriched and tagged with exacting accuracy. This enrichment enables tuning out the noise to reveal finely tuned messages within. Noise includes: Single words with multiple surface forms: These single words are the lemmas, the dictionary form of a word. Dialectal variations: child/children (English singular/plural) beau/beaux/belle/belles (French adjective forms: masculine singular/ masculine plural/feminine singular/feminine plural) colour/color (British English vs. American English) nonante/quatre-vingt dix ( 90 in Belgian French vs. French-French) Recognizing pronouns: attributes linked to pronouns out of context are unhelpful The process of standardizing words begins with careful analysis of the words. Natural Language Processing (NLP) means: Finding word boundaries (needed for Chinese, Japanese, and Korean, which don t reliably use spaces between words) Tagging parts of speech Finding the dictionary form (lemma) of each word 2

Whenever someone asks us, does your system work in language x? The answer is Yes, it does. to the extent that you can put your text into our system and we ll see how it does, Lance Nathan, Senior Linguistics Developer of Luminoso said. The question really isn t does it work in language x, but how well it works in language x. When the linguistic analysis isn t up to snuff, meaningless distinctions like color v. colour creep in, muddying the results. Nathan s job of getting the quality results expected by Luminoso users is easier when the linguistic enrichment is of higher quality. Things that Rosette excelled at, according to Nathan, include finding lexicalized phrases. That is, knowing that nouveaux riches is a single unit that should be singularized to nouveau riche, for instance... The Old Solution Luminoso started by using open source solutions for linguistic analysis, but in some cases, the results were simply not accurate enough for Luminoso. In early experiments, Luminoso used stemming, which is one of the most basic tools to remove suffixes from words. However errors like turning Mount Everest into Mount Ever and stemming installed or installing (but not install ) to instal were unacceptable. More sophisticated tools made more sophisticated mistakes, like deciding that crew is the past tense of crow or that cola should be singularized to colon. The Rosette Solution Luminoso is a successful alumnus of Basis Technology s Startup Program, which puts its professional-grade multilingual text analytics into the hands of early-stage, high-impact firms to help them quickly realize their vision. 3

Things that Rosette excelled at, according to Nathan, include finding lexicalized phrases. That is, knowing that nouveaux riches is a single unit that should be singularized to nouveau riche, for instance, or that according to is just a preposition and not an instance of the verb accord. A case in point being the Portuguese analyzer they were using which sometimes had difficulty splitting contractions such as pelas and pela which are the preposition por plus an article in the context: reforçada pelas mesas fartas e pela moda de viola. The analyzer they were using would lemmatize pela to pelar, meaning to scale a fish. Lemmatization Example: English Linguistic analysis is useful for every language; lemmatization for English improves recall and precision. am are is to be Most search engines utilize a crude method of chopping off characters at the end of a word in the hopes of removing unimportant differences. This method, called stemming, often results in extra recall and poor precision. Instead, RBL finds the true dictionary form of each word, known as a lemma, by using vocabulary, context, and advanced morphological analysis. Indexing the root form increases search relevancy and slims the search index by not indexing all inflected forms. Alternative lemmas are also made available to supplement indexing. CHALLENGE QUERY STEM LEMMA Two unrelated words may share a stem. Stemming may deliver unintended results. Irregular verbs and nouns stump the stemmer. animals animated anim animal animate several sever several spoke spoke speak (v.) spoke (n.) Another reason Rosette replaced the open source solution for Portuguese was Rosette s better recognition of Brazilian Portuguese spellings that allowed it to more frequently return the correct dictionary form. Part Of Speech Tagging As part of the lemmatization process, statistical modeling is used to determine the correct part of speech, even with ambiguous words. Each token is then tagged for enhanced comprehension and search relevancy. Part of Speech Tagging Pronoun Verb Adjective Noun As part of the lemmatization process, statistical modeling is used to determine the correct part of speech, even with ambiguous words. 4

In French, Rosette s part of speech tagger usually distinguishes pas -the-adverb, which marks something as negative from pas -the-noun, which is just a word meaning step. If we had to work solely off tokenized surface forms [in French], we d never be able to draw that distinction, Nathan said. Rosette s ability to identify ambiguous parts of speech with greater accuracy, meant that swapping in Rosette increased Luminoso s robustness in the face of spelling errors such as missing accents in French and Spanish. For the languages that Luminoso has tested so far for its linguistic processing, Rosette has replaced their pre-existing solution. Still for every new language, Nathan still performs a battery of tests to determine which linguistics solution to go with. For a language Luminoso recently started working on, Nathan began by finding a public corpus in the language that was already tagged with parts-of-speech. Then he ran the untagged corpus through Rosette to see how it did. If Rosette wasn t accurate enough, he was going to keep looking. Fortunately, Rosette s output turned out to be a 97-98% match, so he looked no further than Rosette. Using Rosette gets [the results] to a point that I wouldn t be embarrassed to show to a native speaker of the language, Nathan said. The Result Although a few alternatives might be better for single languages, Rosette s accuracy across over 40 languages, and the ease of a unified API for all languages is very valuable to Luminoso, Nathan says. I need to be convinced we are doing at least an adequate job in any new language, Nathan said. Our English is the gold standard. The new language has to be at least pretty good to very good or at a level approaching English. 5

Using Rosette gets [the results] to a point that I wouldn t be embarrassed to show to a native speaker of the language, Nathan said. Our relationship with Basis Technology means we have a ready source of high quality text analytics in a broad range of languages, Luminoso founder and CEO Dr. Catherine Havasi said. Using Rosette positions Luminoso well for faster entry into new markets and languages. The feedback from Luminoso customers is loud and clear. In less than four years, Luminoso has gone from startup to a portfolio of 60 to 70 customers, including Fortune 1000 companies such as MARS, Sony, Intel, Scotts, and ConAgra. With a recent sales relationship with Basis Technology, Luminoso is poised to reach new users worldwide. Our relationship with Basis Technology means we have a ready source of high quality text analytics in a broad range of languages, Founder and CEO Dr. Catherine Havasi said. Using Rosette positions Luminoso well for faster entry into new markets and languages. Rosette provides businesses and government agencies text analytics in 55 languages. 6