State of Machine Learning and Future of Machine Learning (based on the vision of T.M. Mitchell) Rémi Gilleron Mostrare project Lille university and INRIA Futurs www.grappa.univ-lille3.fr/mostrare Collège scientifique FT - January 9, 2007 Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 1 / 1
Outline Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 2 / 1
Utopian View Let us imagine Computers learning from medical records which treatments are more effective for new diseases Houses learning from experience tptimize energy costs based on the particular usage patterns of their occupants Personal software assistants learning from past usage the evolving interests of their users in order to highlight relevant news Personal software assistants learning from the Web in order to organize a journey by using Web services Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 3 / 1
Application Success: Data Mining Objective Extract knowledge from databases Data Mining Industrial systems are available A toolbox of algorithms for extracting, transforming and loading data (ETL), and reporting tools, but, core algorithms are Machine Learning algorithms Some applications Business intelligence Marketing Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 4 / 1
Application success: speech recognition Definition Speech recognition is the process of converting a speech signal into a sequence of words State of speech recognition Commercial systems are available They use machine learning techniques because of greater accuracy by training than by programming by hand Two learning phases: 1 before purchase in a speaker-independent fashion, 2 after purchase in a speaker-dependent fashion Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 5 / 1
Machine Learning A tentative definition by Tom M. Mitchell Central question: How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes Definition: Learning = improving performance at some task through experience Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 6 / 1
Outline Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 7 / 1
A basic task: Supervised Classification Classifying objects into a finite set of groups The problem: Task: classifying real-valued vectors into two groups, i.e. searching for a boolean-valued function Experience: a training set of examples which are pairs (input value, output value) Performance: vectors A set of data records A B... J Group 5 t... 1.5 Y 15 f... 3.2 N 4 t... 3.5 Y............... ability to correctly classify unseen input real-valued A set of rules If A 7 and B = t then Group = Y If B = f and J > 3 then Group = N... An unseen input (A = 2, B = f,..., J = 2)? Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 8 / 1
Why is it difficult (1)? oo o oo o Searching in the set of linear functions: In the figure, empirical error rate is 7 34 20% Algorithms exist for linear separation, but it seems that there is no good hypothesis in the set of linear functions Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 9 / 1
Why is it difficult (2)? oo o oo o Searching in the set of all functions: In the figure, empirical error rate is 0 Algorithms must deal with complexity issues, and noisy examples seem influential, and ability to correctly classify unseen examples is doubtful Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 10 / 1
Why is it difficult (3)? oo o oo o A trade-off between the empirical error rate and the complexity of the function: In the figure, low empirical error rate 2 34 5% The function is simple, and such a function would generalize well Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 11 / 1
State of supervised classification Theory Why is it difficult?: the true error rate (correctly classifying new unseen inputs) depends on an unknown probability distribution statistical learning theory: empirical risk, VC dimension, structural risk and regularized risk, bounds PAC learning: polynomial time complexity, learnability real situation is even worse Off-the-shelf algorithms Support vector machines (SVMs), kernel methods, decision trees, logistic regression, neural networks,... Applications A base task in many application domains: texts, biological sequences, medical records,... Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 12 / 1
Discipline of Machine Learning Theory, algorithms and applications Designing a ML system Modeling the problem: defining the task (representation of inputs and outputs), defining the experience, defining the performance Defining an ML algorithm: set of hypothesis, regularization, solving an optimization problem, complexity issues Evaluation The place of ML outgrowth of the intersection of Computer Science and Statistics Human Learning Empirical sciences: biology, economics, social sciences, control theory,... Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 13 / 1
Outline Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 14 / 1
Machine Learning within Computer Science Application successes data mining, text mining, Web mining speech recognition, computer vision, robot control accelerating empirical sciences A growing niche in software engineering the application is too complex to manually write a successful algorithm and it is easy to collect training data the software must be customized to its operational environment Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 15 / 1
State of ML for other tasks A large toolbox of algorithms many off the shelf algorithms for classification and regression Hidden Markov Models and Conditional Random fields for labeling sequences: texts (NER), biological sequences, Algorithms for searching frequent patterns in sets of records, Algorithms for reinforcement learning (learning control strategies for autonomous agents) With some restrictions objects often described by real-valued vectors, sets of training data should be quite large, the good model is chosen, it is difficult to reuse systems from one application to an other application, Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 16 / 1
Outline Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 17 / 1
Less preprocessing and less postprocessing Complex inputs Can we avoid encoding and deal with complex objects? sequences (texts, biological sequences) trees (HTML documents, phylogenetic trees) graphs (XML and Web documents, social networks, gene networks) combining views (texts and images) Complex outputs can we go beyond classification and regression? a hierarchy of groups, not necessarily defining partitions a set of sequences (annotation), a set of trees (parsing),... Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 18 / 1
Less training data semi-supervised learning Can unlabeled data be helpful for supervised learning? labeling can be costly: labeling or annotating texts or images; or impossible: fraudulent actions, sick patients co-training and EM algorithms should help one-class learning algorithms Active learning What is the best strategy for actively collecting training data? Intelligently asking for the label of an unlabeled example Intelligently choosing patients for drug testing Intelligently exploring the domain for an autonomous robot Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 19 / 1
Less effort in designing ML systems Model selection which learning algorithm should be used when? theory: characterize properties of learning algorithms: convergence properties, relative strengths and weaknesses practical: choosing the right model under limited ressources (for instance limited training data or limited time) transfer learning how transfer what is learned for one task in other tasks? Transfering a learned model from one family of genes to another one Combine learning for multi-objective tasks: or instance, annotation of texts with named entity, semantic role, relations. Reuse expertise from one domain to another one: for instance, from handwriting analysis systems to face recognition systems. Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 20 / 1
Outline Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 21 / 1
Future of machine learning Let us imagine computers learning from medical records... houses learning from experience... personal software assistants learning from past usage... personal software assistants learning from the Web... lead tpen questions never-ending learners learn in a cumulative way self-supervised learners define their experience intelligent learners may use their domain knowledge in learning and the relations between Machine Learning and Computer perception between Machine learning and human learning Rémi Gilleron (INRIA Futurs) state and future of ML collège scientifique FT 2007 22 / 1