ISyE 6416: Computational Statistics Spring Lecture 1: Introduction

Similar documents
Lecture 1: Machine Learning Basics

Generative models and adversarial training

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

(Sub)Gradient Descent

Human Emotion Recognition From Speech

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

CS Machine Learning

Probabilistic Latent Semantic Analysis

Python Machine Learning

The Strong Minimalist Thesis and Bounded Optimality

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Speaker recognition using universal background model on YOHO database

Truth Inference in Crowdsourcing: Is the Problem Solved?

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A study of speaker adaptation for DNN-based speech synthesis

CSL465/603 - Machine Learning

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Comparison of network inference packages and methods for multiple networks inference

Uncertainty concepts, types, sources

Speech Emotion Recognition Using Support Vector Machine

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Introduction to Simulation

Corrective Feedback and Persistent Learning for Information Extraction

Semi-Supervised Face Detection

An Online Handwriting Recognition System For Turkish

Probability and Statistics Curriculum Pacing Guide

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Discovery of Topical Authorities in Instagram

Learning Methods in Multilingual Speech Recognition

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Extending Place Value with Whole Numbers to 1,000,000

Edinburgh Research Explorer

Lecture 1: Basic Concepts of Machine Learning

Learning to Rank with Selection Bias in Personal Search

College Pricing and Income Inequality

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Assignment 1: Predicting Amazon Review Ratings

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

University of Cincinnati College of Medicine. DECISION ANALYSIS AND COST-EFFECTIVENESS BE-7068C: Spring 2016

Machine Learning and Development Policy

On the Formation of Phoneme Categories in DNN Acoustic Models

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Calibration of Confidence Measures in Speech Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Recognition at ICSI: Broadcast News and beyond

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

On-Line Data Analytics

Model Ensemble for Click Prediction in Bing Search Ads

WHEN THERE IS A mismatch between the acoustic

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Attributed Social Network Embedding

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

BMBF Project ROBUKOM: Robust Communication Networks

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Software Maintenance

Regret-based Reward Elicitation for Markov Decision Processes

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Universityy. The content of

Using Synonyms for Author Recognition

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Issues in the Mining of Heart Failure Datasets

Modeling function word errors in DNN-HMM based LVCSR systems

Support Vector Machines for Speaker and Language Recognition

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

12- A whirlwind tour of statistics

A Comparison of Annealing Techniques for Academic Course Scheduling

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

arxiv: v2 [cs.cv] 30 Mar 2017

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

A survey of multi-view machine learning

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Learning Methods for Fuzzy Systems

Transcription:

ISyE 6416: Computational Statistics Spring 2017 Lecture 1: Introduction Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

What this course is about Interface between statistics and computer science Closely related to machine learning, data mining, and data analytics Aim at the design of algorithm for implementing statistical methods on computers

Major components Optimization tools for statistics First order and second order methods for likelihood Expectation-maximization methods Parametric methods Gaussian mixture model (GMM) Hidden Markov model (HMM) Model selection and cross validation Non-parametric methods Principle component analysis and low-rank models splines and approximation of functions Bootstrap and resampling Monte Carlo methods

Statistics data: images, video, audio, text, etc. sensor networks, social networks, internet, genome. statistics provide tools to I model data e.g. distributions, Gaussian mixture models, hidden Markov models I formulate problems or ask questions My Research Interests e.g. maximum likelihood, Bayesian methods, point estimators, My research is motivated by data-analytic hypothesis tests, how to design experiments problems in big sensor networks physical sensors Engine prognostics social sensors Social Media Influence: Foundational Learning for Pharmaceutical Firms Social networks Consumers use of social media to learn about brands and make purchase decisions has risen substantially in just the past year yet varies by category according to research by Knowledge Networks and MediaPost Communications. Tapping into the syndicated findings from The Faces of Social MediaSM, marketers can better understand how to listen and learn from segments most influenced by social media. BY S Geophysical, environmental sensor array Power system ALBINA ITSKHOKI ocial Media s (SoMe) relationship with pharmaceutical companies has been complicated. In 2010 the FDA sent 22 warning letters to pharma companies triggered by their use of SoMe and digital marketing. And Facebook gave pharma marketers pause when it announced that it would no longer allow comments to be disabled on brand pages, raising the possibility that Facebook users could post comments about side effects, off-label uses and other topics that would trigger adverse event reporting. FDA s DDMAC (now OPDP) has continually postponed providing clear guidance to marketers on how to approach SoMe. Yet Rx and OTC marketers are using SoMe, frequently with a more subtle approach than traditional advertising. Sanofi, Boehringer-Ingelheim and Novo Nordisk, for example, have all launched programs that encompass SoMe to help diabetes patients manage their condition. The medium s inherent ability to enable pharmaceutical firms to hear points of view GDELT event streams and ideas, build relationships and sustain deeper, more personal connections is ideally suited to pharma marketers goals; but the uncontrolled nature of the conversation poses regulatory concerns. The medium is not going to go away, people will use it to not just talk to their friends, but become fans of pharma brands; and the concern in some pharma firms also will not go away. So, what should Rx firms harness from SoMe information that enables them to understand how it influences treatment choices? Developed by Knowledge Networks and MediaPost Communications, The Faces of Social MediaSM clarifies the marketing consequences of SoMe for purchase decisions across product categories, including Rx and OTC medications, among six SoMe segments. SoMe usage trends The time is right to consider this marketing and research question, as SoMe is becoming an essential part of consumers DTC Perspectives December 2011 1 Citation networks

Statistics needs computing once the problem has been formulated, we have to solve and problem and this relies on computing the forms of the mathematical problem does not relate to how to solve it computing: find efficient algorithms to solve them e.g. maximum likelihood requires finding maximum of a cost function Before there were computers, there were algorithms. But now that there are computers, there are even more algorithms, and algorithms lie at the heart of computing. Algorithm (loosely speaking) a method or a set of instructions for doing something... A program is a set of computer instructions that implement the algorithm.

computational statistics vs. optimization choosing decision parameter value to minimize the decision risk Example: linear regression (x i, y i ), i = 1,..., n. Risk function: R(a, b) = n i=1 (y i (ax i + b) 2 (â, ˆb) = arg min R(a, b) a,b

choosing parameter value according to maximum likelihood Example: maximum likelihood θ: parameter, x: data log-likelihood function l(θ x) log f(x θ) ˆθ ML = arg max l(θ x) θ drop dependence on x, but remember that l(θ) is a function of data x Simplest setting: maximize the log-likelihood function by setting dl(θ) dθ = 0 How to find a solution to the optimization problem? Is there is a global solution, or there are many local solutions?

computational statistics vs. linear algebra A common data structure for statistical analysis is the rectangular array: a matrix the property of the matrix says a lot about the structure of the data variables variables observations observations Common statistics.. High-dimensional statistics

How to solve large linear systems y = Ax linear regression: A data matrix; y vector of response variables, we need to solve (A A) 1 A y directly compute matrix inverse may not be practical needs various regularization to obtain good solution Example: big data challenge The Human Genome Project has made great progress toward the goals of identifying all the 100,000 genes in human DNA. With 10 patients, A is of size 10 by 100,000.

Statistics needs computing - II many realistic models are not as mathematically tractable, we may use computationally intensive methods involving simulation, resampling of data etc. Example simple Bayesian inference x N (µ, σ 2 ), µ N (θ, τ 2 ) τ posterior distribution µ x N ( 2 σ2 σ θ, 2 τ 2 ) σ 2 +τ 2 σ 2 +τ 2 x + τ 2 +σ 2 But in other case x N (µ, σ 2 ), µ Unif[0, 1], posterior distribution µ x is not any known distribution

Statistics needs computing - III nge-point detection for nsional streaming data to discover structure in the data: gaps, gaps, clusters, principle components, rank, linear relationship between variables, etc. g new computationally efficient erful algorithms to detecting 1 0.4 e data 0.8 0.6 0.4 0.2 0 1 0.5 0 0 0.5 1 0.2 0-0.2-0.4 2 1 0-1 -2-1 0 1 ing e.g. fullswarm rank rank behavior 2 change detection

Example: Netflix Problem Netflix database: About 1,000,000 users and 25, 000 movies Quantized moving ratings (e.g, 1,2,3,4,5) Observe a subset of entries (sparsely sampled)

Guess the missing ratings? es ng entries? movie&! observed( users& 3 1 5 1 2 5 3 movie&! true(preference( users& 3.5 1.3 4.43 1.01 2.1 4.9 3.5?????????????????????????

Regularized maximum-likelihood estimator log-likelihood function for categorical matrix completion F Ω,Y (X) (i,j) Ω k=1 K I [Yij =a k ] log(f k (X ij )). Nuclear norm regularization likelihood function S M = arg max F Ω,Y (X), X S { X R d 1 d 2 + : X α rd 1 d 2, α X ij α, (i, j) [d 1 ] [d 2 ]},

Optimization problem non-convex optimization problem min M Γ f(m) + λ M matrix completion f(m) = (ij) Ω log p(y ij M ij ) Γ: set of feasible estimators exact algorithm: Semidefinite program (SDP) O(d 4 ) approximate algorithm: singular value thresholding O(d 3 )

Another example: HMM algorithm

Hidden Markov Model

Formalism

Decoding Viterbi algorithm

computing needs statistics

The age of big data

danger of big data

Uncertainty quantification for algorithms many machine learning algorithms, little tools for uncertainty quantification ( error bars ) Many open research problems

Example: bootstrap idea: in statistics, we learn about characteristics of the population by taking samples. bootstrapping learns about the sample characteristics by taking resamples and use the information to infer to the population resample: we retake samples from the original samples calculate the standard error of an estimator, construct confidence intervals, and many other uses