CSA4020. Multimedia Systems:

Similar documents
Controlled vocabulary

Probabilistic Latent Semantic Analysis

AQUA: An Ontology-Driven Question Answering System

Ontological spine, localization and multilingual access

A Case Study: News Classification Based on Term Frequency

On document relevance and lexical cohesion between query terms

Cross Language Information Retrieval

Linking Task: Identifying authors and book titles in verbose queries

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

The Role of String Similarity Metrics in Ontology Alignment

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

What the National Curriculum requires in reading at Y5 and Y6

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Loughton School s curriculum evening. 28 th February 2017

Literature and the Language Arts Experiencing Literature

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Bayesian Learning Approach to Concept-Based Document Classification

Lexical category induction using lexically-specific templates

The College Board Redesigned SAT Grade 12

Words come in categories

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Short Text Understanding Through Lexical-Semantic Analysis

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

The Smart/Empire TIPSTER IR System

South Carolina English Language Arts

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Degree Qualification Profiles Intellectual Skills

Language Acquisition Chart

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Methods for the Qualitative Evaluation of Lexical Association Measures

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

Compositional Semantics

Platform for the Development of Accessible Vocational Training

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Matching Similarity for Keyword-Based Clustering

A Topic Maps-based ontology IR system versus Clustering-based IR System: A Comparative Study in Security Domain

arxiv: v1 [cs.cl] 2 Apr 2017

Constructing Parallel Corpus from Movie Subtitles

THE VERB ARGUMENT BROWSER

Cross-Lingual Text Categorization

A Domain Ontology Development Environment Using a MRD and Text Corpus

Handling Sparsity for Verb Noun MWE Token Classification

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

Radius STEM Readiness TM

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Multilingual Sentiment and Subjectivity Analysis

What is a Mental Model?

An Interactive Intelligent Language Tutor Over The Internet

Using Semantic Relations to Refine Coreference Decisions

5 Guidelines for Learning to Spell

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

Cal s Dinner Card Deals

Myths, Legends, Fairytales and Novels (Writing a Letter)

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Grade 4. Common Core Adoption Process. (Unpacked Standards)

National Literacy and Numeracy Framework for years 3/4

Dyslexia and Dyscalculia Screeners Digital. Guidance and Information for Teachers

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

California Department of Education English Language Development Standards for Grade 8

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Instructional Supports for Common Core and Beyond: FORMATIVE ASSESMENT

A Graph Based Authorship Identification Approach

Study Group Handbook

10.2. Behavior models

Aspectual Classes of Verb Phrases

Reading Comprehension Lesson Plan

Vocabulary Agreement Among Model Summaries And Source Documents 1

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Word Segmentation of Off-line Handwritten Documents

Universiteit Leiden ICT in Business

correlated to the Nebraska Reading/Writing Standards Grades 9-12

A Comparison of Two Text Representations for Sentiment Analysis

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

Postprint.

This Performance Standards include four major components. They are

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Advanced Grammar in Use

Grade 6: Correlated to AGS Basic Math Skills

The stages of event extraction

Unit 3 Ratios and Rates Math 6

Proof Theory for Syntacticians

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Using computational modeling in language acquisition research

Transcription:

CSA4020 Multimedia Systems: Adaptive Hypermedia Systems Lecture 7: Term Relationships & Grouping Multimedia Systems: Adaptive Hypermedia Systems 1

Problems with Single-Term Indexing Single terms are either too specific or too broad Single terms carry no context Single terms are more ambiguous Multimedia Systems: Adaptive Hypermedia Systems 2

Generation of Complex Identifiers Manual content analysis and indexing Automatic Linguistic analysis (to generate linguistically related terms) Term clustering (based on term cooccurence stats) Probabilistic analysis (incorporating term-dependence information) Multimedia Systems: Adaptive Hypermedia Systems 3

Automatic Term Classification Construct term matrix from existing document collection T 1 T 2... T t D 1 d 1,1 d 1,2... d 1,t D 2 d 2,1 d 2,2... d 2,t.............................. D n d n,1 d n,2... d n,t Similar terms tend to be used in the same documents: Group terms based on similarity amongst columns Similar documents contain related terms: Group docs into doc classes based on similarity between rows, then group terms with high frequency of co-occurrence within a doc class Multimedia Systems: Adaptive Hypermedia Systems 4

Problems Co-occurring terms may not be related! Statistical methods may not be reliable (low precision and recall) Multimedia Systems: Adaptive Hypermedia Systems 5

Linguistic Methods Identify syntactic classes and construct word phrases based on patterns of syntactic markers (such as noun-noun, adjective-noun) Problems: Ambiguous words and syntactic structures Unreliable Solution: Develop good parser/semantic analysers Use statistical methods to resolve ambiguity Accept fact that automatic analysis is not perfect Multimedia Systems: Adaptive Hypermedia Systems 6

Term Phrase Formation Provides more specific information than single terms, e.g.: 1. Choose a phrase head (high freq term or term with negative discriminatory value) 2. Add to this other terms with low/medium frequency (can limit terms to occur in same sentence, etc) 3. Eliminate stop words The more restrictions in step 2, the fewer phrases Can combine with linguistic analysis. Term phrases: must conform to specific syntactic patterns must occur within same sentence unit can be augmented with domain-specific semantic analysis conceptual graphs (semantically similar, but syntactically different) Multimedia Systems: Adaptive Hypermedia Systems 7

Thesaurus Group Generation Thesaurus can be used to broaden scope of terms Can convert every term in same class to the name of the class (controlled vocabulary) Can also stem to reduce size of thesaurus (but must ensure that different word senses are maintained) Domain-specific thesauri are usually created manually Multimedia Systems: Adaptive Hypermedia Systems 8

Thesaurus Group Generation based on term co-occurrence Given the term-document matrix: T 1 T 2... T t D 1 d 1,1 d 1,2... d 1,t D 2 d 2,1 d 2,2... d 2,t.............................. D n d n,1 d n,2... d n,t Compute the similarity between terms T j and T k : sim(t j,t k ) =  n  d i, j d j=1 i, j n d j=1 i, j  n 2 2 d i,k i=1 Single-link classification: 2 words are put into same group if sim > threshold Complete-link: sim of each pair of words in a group > threshold Multimedia Systems: Adaptive Hypermedia Systems 9

Pseudo Classification Given a sample collection, and a sample set of queries with relevance judgements: if D and Q are judged relevant, two terms T j in Q and T k in D are placed in same group Such assignment will increase sim between D and Q Similar principle is used in relevance feedback Multimedia Systems: Adaptive Hypermedia Systems 10