Interests

Scalability issues on Machine Learning and Classification

Currently, I am working on several topics related to scalabity issues on indexing, classification, machine learning, indexing and knowledge discovery. I am particularly interested on tasks like

  • kNN search on Giga-records databases;
  • non-supervised learning (clustering) on large (>106 records), high-dimensional (>100 dimensions) databases;
  • multimodal information retrieval over millions of documents.

High-dimensional indexing

The subject of my thesis and of my recent research is the indexing of high-dimensional multimedia descriptors in order to accelerate the k nearest neighbours search (also known as kNN search or similarity search). I am studying how the use of multiple moderate-dimensional indexes can help to tame the "curse of dimensionality" and how the use of space-filling curves can help to store the indexes into fast, convenient and easy-to-update data-structures like the one-dimensional B-tree.

Related publications:

Content-based information retrieval (CBIR) and Image Identification

My thesis was also concerned on the application of CBIR to image identification (also known as copy detection, near-duplicate detection — terms which I avoid, because the target images, in my case, may have suffered strong modifications. This technique finds application in the detection of copyright violations and also — which was my main interest — in many Cultural Heritage activities.

Related publications:

Computer Sciences and Cultural Heritage

I am very fond of research in the interface between Computer Science and Cultural Heritage. My M.Sc. research was related to the serious (and still unanswered) problem of Digital Longevity: how to preserve digital data for decades and even centuries. During my Ph.D. my emphasis shifted from preservation to access: the use of CBIR to allow the retrieval of images whose metadata are missing. I am still interested in all problems pertaining to this rich interface: digital preservation, digitisation of collections, digital libraries, asset management for Cultural Heritage, and digital techniques for conservation / restoration.

Related publications:

Last update 23 July, 2009 23:29 GMT-3. © Eduardo A. do Valle Jr., 1995–2009.