Short Course on Machine Learning for Web Mining - Introduzione al Corso - (a.a. 2009-2010) Roberto Basili (University of Roma, Tor Vergata) 1
Overview MLxWM: Motivations and perspectives A temptative syllabus Introduction to Machine Learning 2
WM&R: Motivazioni What is Web Mining? Why IR? Why Machine Learning? What is the IR contribution to Social Web practices? What are the perspectives of the adoption of these technologies? 3
What is Web Mining? Web Mining is currently gathering a number of different technologies required to exploit the huge set of information made availablein the Web: Contents: data but people, locations events, concepts as well Relations: Links, Web structure Thematic, Concpetual and interpersonal links Redundancies (duplicates, quasi-duplicates) Multilinguality Trends e social behaviours Opinions 4
Why IR? The size of the involved information space poses a localization probelm The automatic access is possible only if a suitable notion of relevance is made available Web search proceeds through the computation of a stochastic function i.e. a mapping between the information needs and the useful data/resources 5
Machine Learning vs IR? The information involved in the Web search scenarios is heterogeneous and is intrisecally uncertain, characterized by: Incompleteness Rich data models, complex formats and access modes Vague requirements Subjectivity Timeliness 6
ML vs. IR The pervasivity of the uncertainty aspects in the information distributed in the Web makes the search for globally exact (or exhaustive) solutions impractical Finding diamonds in the rough (Fan Chung, UCSD) 7
ML vs. IR ML technologies propose a wide set of methods, algorithms, strategies and technologies able to locate and develop effective sub-optimal solutions In the learning process the data themselves suggest the proper representation (or mappings) that corresponds to a given hypothesized solution This hypothesis is expected to improve the overall performance of a base system: Accuracy Computational Efficiency 8
Attempted Syllabus Introduction to Machine Learning: between statistics and knowledge engineering Automatic Classification: Decision trees and performance evaluation Probabilistic Text Classification Sequence labeling tasks: Hidden Markov Models Introduction to PAC Learning: the VC dimension Support Vector Machines Kernel-based Learning: sequence kernel Kernel-based Learning: geometrical embeddings and kernels 9
Machine Learning A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with E. (Mitchell, 1997) Critical Issues: Task, Experience, Performance ( P) 10
Experience and Learning In the chess game, for example, the experience can be provided as: Data about the winning (or loosing) games in the past, that suggest positive or negative impact of the followed strategies. Suggestions (or guide) provided by an external observer (also called oracle) Self-observation, that is the analysis of our own previous games, according to an explicit model of the match, strategies, behaviours, 11
Experience and Learning (2) Three forms of learning: Experience based learning or Inductive learning (past matches plus a utility function, i.e. the success score), Supervised Learning (matches annotated in the oracle) Knowledge-based learning, where an explicit task model is availble, and it guides the development of suitable process and behaviour models. 12
Unsupervised learning When no oracle nor any task model is available still methods to improve performances can be developed: A better world/task model can be learned (knowledge acquisition/discovery) Better performance: some form of optimization can be promoted Caching vs. case-based reasoning 13
Unsupervised Learning Example: the MP3 collection Clustering according to audio properties can be applied to develop a hierachical organization Search efficiency increases while the expressiveness of the system knowledge based is also improved are 14
Unsupervised Learning The future interaction between the system and its operational environment is greatly improved. The semantic transparency of the KBs with respect to the traditional (and naive) users significantly increases. are 15
Machine Learning Learning a function from examples: Continuous case: regression Discrete case: classification Example: a function to need a discrete function able to distinguish: 2 classes, cats vs. dogs f : X {cats, dogs} Given a set of examples E for the two classes: We can extract (visible) features (height, has_whiskers, type_of_coat, number_of_legs). The learning algorithms is applied to E and a function h (as the hypotheissi for f) is generated 16
Learning Algorithms and Function classes Boolean functions (e.g., decision trees). Probability functions, (e.g., Bayesian classifier, NB). Analytical functions in vector spaces (halfplanes) Linear case: perceptrons, Support Vector Machines, Non linear case: k-nn, multilayer neural nets, Geometrical approaches, space transformations: embeddings, spectral analysis 17
Decision Trees (Cats vs. Dogs) Height > 50 cm? No Has a fur (coat)? Yes Output: Dog No Yes Has wiskers? No Yes Output: Dog Output: Cat Output: Cat 18
MLxWM: Technological Perspectives Exponential Growth of the problem size Increasing focus on heterogenous (e.g. multimedia) data Social Web: Web 2.0 Software systems are going to play an increasingly important role Software as a Service Personalization 19
The Long Tail Maren Jinnett over data compiled by the UK s Civil 20 Aviation Authority. (Wired Blog network, Oct 2009)
Social Web 21
22
23 Hype Cycle for Social Software 2008 (Source: Gartner[1])
24
25
References Mitchell, Tom. M. 1997. Machine Learning. New York: McGraw-Hill. Kernel machines, neural networks and graphical models, P. Frasconi, A. Sperduti, A. Starita, Rivista AI*IA Numero speciale per i 50 anni di IA, 2007. Nice Video lectures by Andrew Ng (Stanford) http://academicearth.org/courses/machine-learning 27