|
|
|
AERFAISS'06 AERFAI Summer School on Action and Object Classification Techniques in Digital Images Granada, June 12-17, 2006 ABSTRACT
Supervised classification: S.V.M., by F. Perez-Cruz, Univ. Carlos III, Spain Supervised Machine Learning has experience a great advance in the last decade with the extension of learning theory and nonparametric Bayesian statistics for kernel methods. In this tutorial, we will first review some basic concepts in learning theory and Bayesian Machine learning together with the concept of kernels for computing in Hilbert spaces. We will move on to study in detail the support vector machines (SVMs). We will start with the optimal hyperplane decisin rule (OHDR) that will help us introduce the margin concept in a natural and direct form. We will show how the OHDR can be solved using Lagrange multipliers and that its solution only depends on inner-products. This will lead us directly to the kernel formulation, namely the Support Vector Machine, in which the inner products are replaced by kernels in a feature space. We will also deal with inseparable problems and multiple instance classification. We will complete our presentation of SVMs presenting its extension to one-class and regression. To complete the tutorial, we will also review other supervised learning algorithms that can be of great interest to machine learning practitioners. We will discuss Gaussian Processes for regression and classifications. These methods can be seen as extensions of kernel methods that provide not only point-wise predictions but also confidence intervals over them. We will conclude the presentation with kernel conditional random fields, which is a novel tool to solve general classification problems in which the considered output is not a scalar but a more complex structure as a string or a vector.
Semi-supervised classification, by X. Zhu, Univ. Wisconsin- Madison, USA A fundamental problem in machine learning is supervised learning, in particular classification. Traditionally classifiers are trained using --and only using -- labeled data (input feature/output label pairs). As anyone who has manually labeled image datasets knows, labeled data can be painfully slow and hard to obtain. In many applications unlabeled data (input feature only) may however be abundant and easy to get. Semi-supervised learning attempts to use small amount of available labeled data, together with large amount of unlabeled data, to build better classifiers than using the labeled data alone. In this lecture we introduce some state-of-the-art semi-supervised learning methods, present the algorithms, and discuss their assumptions. 1. Introduction to semi-supervised learning 2. Probabilistic generative models 3. Co-training 4. Transductive Support Vector Machines 5. Graph-based manifold regularization
No-suprevised clustering Techniques, by A.L. Fred, IST, Portugal.
Short-Term Human action recognition - insights from and to biology, by J. Santos-Victor, IST, Portugal.
The talk is divided in two mains parts: PART 1: We discuss how to design a computer vision system able to characterize human actions from video sequences. The approach is based on the use of different features and on the selection of the features that are most relevant for the recognition. The feature selection is made together with the classifer design. Also we will show how the structure of the classifier can have a significant impact on the overall performance. Results on a large number of images are presented and compared PART2: When designing artificial systems for human action recognition, one cannot avoid admiring how easily such tasks seems to be to humans. Recent findings in neuroscience suggest the existence of specific structures in the human brain that are used for producing and recognizing goal-oriented actions. We will discuss some of these experiments and see how to build artificial models to help understand how the functioning of the brain (specific parts). Finally, the understanding of these models may help designing better and more flexible artificial systems. In the end of the second part of the talk, we will discuss the construction of a humanoid robot, following biological plausible principles, and how it can be used to better understand human cognition and sensorimotor coordination Some related articles: * Human Activity Recognition from Video: modeling, feature selection and classification architecture,* /Pedro Ribeiro and José Santos-Victor/ VisLab-TR 12/2005, HAREM 2005 - BMVC Workshop on Human Activity and Modelling, Oxford, UK, September 2005. PDF file http://www.isr.ist.utl.pt/vislab/publications/05-harem.pdf * Extracting Motion Features for Visual Human Activity Representation,* /Filiberto Pla, Pedro Ribeiro, José Santos-Victor, Alexandre Bernardino/ VisLab-TR 03/2005, IBPRIA - 2nd Iberian Conference on Pattern Recognition and Image Analysis, Estoril, Portugal, June 7-9, 2005. PDF file http://www.isr.ist.utl.pt/vislab/publications/05-ibpria-flow.pdf *Visual Learning by Imitation With Motor Representations,* /Manuel Lopes and José Santos-Victor/ VisLab-TR 08/2005, IEEE Transactions on System Man and Cybernetics - Part B: Cybernetics, June 2005. PDF file http://www.isr.ist.utl.pt/vislab/publications/05-ieee-smcb.pdf * A Developmental Roadmap for Task Learning by Imitation in Humanoid Robots: Baltazar’s Story,* /M. Lopes, A. Bernardino and J. Santos-Victor/ VisLab-TR 01/2005, AISB 2005 Symposium on Imitation in Animals and Artifacts, University of Hertfordshire, UK, 12-14 April 2005. PDF file http://www.isr.ist.utl.pt/vislab/publications/05-aisb-imitation.pdf
Large Scale Actions Detection, by R. Fisher, Univ. Edinburgh, UK. The lecture series will give an introduction human behaviour recognition in video sequences, from detection to recognizing short-term and long term activities. 1. Introduction: Intro Lecture 2. Detecting and tracking moving humans: Person Detection Lecture 3. Maintaining persistence when humans trajectories overlap: Persistence Lecture 4. Identifying short-term actions: Action Recognition Lecture 5. Representing and recognizing action sequences: Probabilistic Recognition Lecture There will be a small behavior understanding practical activity for teams of students.
Image Sequence Interpretation, by Bill Triggs, INRIA Rhone-Alpes, France.
Invariant feature estimation, by C. Schimd, INRIA, Rhone-Alphes, France. Local photometric features have become popular as a practical and effective approach to image matching and recognition. They are
distinctive as well as robust to occlusion and clutter. Recently these features have been extended to be scale and affine invariant which allows matching and recognition in the presence of large scale
and viewpoint changes. In this course we present these features and evaluate their performance. We also describe their application
to image matching and recognition of particular objects. We will demo a system that uses these features to perform object recognition in
real time.
Probabilistic models of visual object categories, by A. Zisserman, Univ. Oxford, UK. There has been much recent research activity - and much recent success - in recognizing visual object categories (such as cars, faces, motorbikes) in images. The success has come from representing objects by sets of local iconic image fragments, where each fragment may be thought of as a "visual word" for describing part of the object. Surprisingly object categories can be recognized without including the spatial organization/location of the patches, and these models are referred to as a "bag of words" in analogy with similar models in the statistical text understanding literature. However, including a probabilistic representation of the configuration of the words enables object location to be determined, and in turn this enables an accurate segmentation of the image. This course will cover the learning and application of these visual object category models. Overview - bag of visual words model. Demo: recognizing particular objects. - classifying images using bag of words model - probabilistic configuration models - class specific segmentation - research challenges Reading material Sivic & Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos",ICCV 2003 http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic03.pdf Csurka et al "Visual Categorization with Bags of Keypoints", ECCV International Workshop on Statistical Learning in Computer Vision, Prague, 2004. http://www.xrce.xerox.com/Publications/Attachments/2004-010/2004_010.pdf Zhang et al, "Local Features and Kernels for Classifcation of Texture and Object Categories: An In-Depth Study", 2005. http://lear.inrialpes.fr/pubs/2005/ZMLS05/Zhang05-RR-5737.pdf Leibe & Schiele, "Interleaved object categorization and segmentation", BMVC 2003 http://www.mis.informatik.tu-darmstadt.de/People/leibe/papers/leibe-interleaved-bmvc03.pdf Fergus et al, "A Sparse Object Category Model for Efficient Learning and Exhaustive Recognition", CVPR 2005. http://www.robots.ox.ac.uk/~vgg/publications/papers/fergus05.pdf
Biologically inspired models, by M. Riesenhuber, Univ. Georgetown, USA Object recognition is fundamental to the behavior of higher primates, whose visual systems rapidly and apparently effortlessly recognize a large number of diverse objects in cluttered, natural scenes at a level far beyond that of current machine vision systems. In recent years, significant progress has been made in elucidating the neural mechanisms underlying object recognition in cortex. I will review data from monkey electrophysiology, human brain imaging and behavior that argue for a simple model of cortical object recognition based on a hierarchy of neurons tuned to increasingly complex image features and showing increasing tolerance to stimulus transformations. In particular, the data suggest that the visual system solves the specificity-invariance trade-off at the heart of the problem of object recognition through an architecture consisting of an iterative combination of just two operations, one creating more complex features from simpler ones (through a template match operation), and one increasing the tolerance to stimulus transformations by performing a pooling operation over units tuned to transformed versions of the same feature. I will then present recent machine vision systems based on this biological architecture. These systems have been found to produce the currently best results on hard object detection problems (the Caltech 101 database). Further qualities of these biologically-inspired systems include the ability to learn from few examples and a high robustness to clutter, indicating the usefulness of concepts from biological vision to build better machine vision systems.
|