Thousand Words of Art

Schedule

All particpants should have recieved information in case of missing information please contact tutorial@thousandwords.art

11:00 - 11.05 CEST / 18:00 - 18.05 JST

Introduction

Welcome remarks and tutorial structure
11:05 - 11.35 CEST / 18:05 - 18:35 JST

Part 1: Retrieval and Knowledge Graphs

Stuart James

The use of Computer Vision for distant reading in image collections are generally within the setting of retrieval — searching via an example within a collection. To do this, a computational description of the image needs to generate to be then able to compare one image to another. How such a representation is learned is important to provide a powerful retrieval system. While using pre-trained approaches such as neural networks are useful, they fail to bridge the visual difference between photo-realistic images and common art or humanities image collections. In this part, we explore how anyone can train a neural network representation that is specific to their dataset with varying degrees of supervision, and specifically exploiting supervision that can be provided through Knowledge Graphs (or Semantic Web) to enhance the differential power of the representations.
11:35 - 12:10 CEST / 18:35 - 19:10 JST

Part 2: Content-based analysis

Mathieu Aubry

Most Deep Learning image techniques rely on annotated collections. While these might be available for some well-studied types of documents, they cannot be expected for more specialized studies or sources. Instead, one would have to rely on techniques that do not require training data. This part will discuss several such techniques to establish links between artworks and historical documents, including the use of generic local features, synthetic data, self-supervised learning, and object discovery techniques. In addition, this will include examples of applications for repeated patterns discovery in artwork collections, fine artwork alignment, document images segmentation, historical watermarks recognition, scientific illustration propagation analysis, and unsupervised Optical Character Recognition. In all cases, it will be shown that standard approaches can give useful baseline results when tuned adequately, but that developing dedicated approaches that take into account the specificity of the data and the problem significantly improves the results.
12:10 - 12:25 CEST / 19:10 - 19:25 JST

[Break]
12:25 - 13:00 CEST / 19:25 - 20:00 JST

Part 3: Multi-Task Learning

Nanne van Noord

Multi-Task Learning (MTL) is an increasingly prominent paradigm in Computer Vision and in Artificial Intelligence in general. It centers around the ability to perform multiple tasks based on a single input. For instance, it is possible to predict for a single image of an artwork when it was made by who and using what materials. Jointly performing these tasks involves specific modeling choices, resulting in clear benefits (robustness, improved performance), but it also has potential downsides (negative interference, increased complexity). In this part, we show when and how we might want to apply MTL, through a number of use cases, as well as an overview of the technical underpinnings. In addition, highlight the possibilities MTL provides for interpretability by shedding light on relations between tasks.
13:00 - 13:35 CEST / 20:00 - 20.35 JST

Part 4: Automatic interpretation

Noa Garcia

In Computer Vision, visual arts are often studied from an aesthetics perspective, mainly by analyzing the visual appearance of an art reproduction to infer its attributes (author, year of creation, theme, etc.), its representative elements (objects, people, locations, etc.), or to transfer the style across different images. However, understanding an artistic representation involves mastering complex comprehension processes, such as identifying the socio-political context of the artwork or recognizing the artist’s main influences. In this part, we will explore fine-art paintings from both a visual and a language perspective. The aim is to bridge the gap between the visual appearance of an artwork and its underlying meaning by jointly analyzing its aesthetics and semantics. We will explore multimodal techniques for interpreting artworks, and we will show how Computer Vision approaches can learn to automatically generate descriptions for fine-art paintings in natural language.
13:35 - 13:50 CEST / 20:35 - 20:50 JST

[Break]
13:50 - 14:25 CEST / 20:50 - 21.25 JST

Part 5: Using Computer Vision within humanities research

Leonardo Impett

Most models in Computer Vision research are built to solve specific problems with measurable outcomes (often tied to a set of reference datasets): pixelwise segmentation, object detection, image captioning, keypoint detection, etc. With many open-source computer vision models for each kind of task, we have a wide horizon of powerful tools at our disposal: yet most of them don’t easily fit with research questions in art history, visual culture studies, or the visual humanities more generally. By dissecting a series of previous projects in this area, this part will look at how researchers have negotiated these connections, including complex and difficult questions of bias, interpretability, and the epistemology of computational results within the humanities (and especially within cultural history). We will look at several methodological modes compatible with the affordances of Computer Vision, including image replication, computational iconography, and the study of visual phenomena captured through notational systems.
Closing
Remarks

SPEAKERS

Stuart James

Researcher (Assistant Professor) in Computer Vision at the Istituto Italiano di Tecnologia (IIT). Stuart's research focus is on Visual Reasoning to understand the layout of visual content from Iconography (e.g. Sketches) to 3D Scene understanding. He is a PI on the MEMEX RIA EU H2020 project and Co-PI on the RePAIR EU FET H2020. Stuart has previously held PostDoc positions at IIT, University College London (UCL) and the University of Surrey. Also, at Surrey, Stuart was awarded his PhD in visual information retrieval using sketches. Stuart continues to hold an honorary position at UCL and UCL Digital Humanities.

Mathieu Aubry

Mathieu Aubry is a tenured researcher in the Imagine team of Ecole des Ponts, focussing on Computer Vision and Digital Humanities. He obtained his PhD from ENS in 2015 and his Habilitation (HDR) in 2019. He had a leading role in several digital humanity projects such as the Young Researcher EnHerit ANR project on enhancing heritage image databases. He was a keynote speaker in several venues, including most recently the ACM Multimedia 2021 workshop on Structuring and Understanding of Multimedia heritAge Contents (SUMAC). He is an associated editor for CVIU and was an area chair at numerous Computer Vision conferences.

Nanne van Noord

Nanne van Noord is Assistant Professor in the Multimedia Analytics Lab of the University of Amsterdam. His research is focused on the intersection of multimedia analysis and visual culture. He did his PhD at Tilburg University on modeling the artist's style for recognition and reproduction, as part of the NWO Science4arts project Reassessing Vincent van Gogh. He previously worked in The Sensory Moving Image Archive (SEMIA) project, and coordinated the Computer Vision taskforce in the national infrastructure project CLARIAH.

Noa Garcia

Noa Garcia is an Assistant Professor at the Institute for Datability Science at Osaka University. Her research interests lay at the intersection of computer vision, natural language processing, and art. She has been involved in multiple projects related to computer vision for art, with a special focus on language description and interpretation. She moved to Japan in 2018, after completing her Ph.D. in Computer Science at Aston University, United Kingdom.

Leonardo Impett

Leonardo Impett is an Assistant Professor of Digital Humanities at the University of Cambridge. His main work has to do with computer vision for the "distant reading" of art history (CS applied to the humanities), and visual studies as a route to understanding computer vision (the humanities applied to CS). He was previously based at Durham University, Harvard University, the Max Planck Institute for Art History, and EPFL.

Schedule

11:00 - 11.05 CEST / 18:00 - 18.05 JST

Introduction

11:05 - 11.35 CEST / 18:05 - 18:35 JST

Part 1: Retrieval and Knowledge Graphs

Stuart James

11:35 - 12:10 CEST / 18:35 - 19:10 JST

Part 2: Content-based analysis

Mathieu Aubry

12:10 - 12:25 CEST / 19:10 - 19:25 JST

[Break]

12:25 - 13:00 CEST / 19:25 - 20:00 JST

Part 3: Multi-Task Learning

Nanne van Noord

13:00 - 13:35 CEST / 20:00 - 20.35 JST

Part 4: Automatic interpretation

Noa Garcia

13:35 - 13:50 CEST / 20:35 - 20:50 JST

[Break]

13:50 - 14:25 CEST / 20:50 - 21.25 JST

Part 5: Using Computer Vision within humanities research

Leonardo Impett

Closing
Remarks

SPEAKERS

Stuart James

Mathieu Aubry

Nanne van Noord

Noa Garcia

Leonardo Impett

Sponsors

AVinDH SIG

Contact Us

Schedule

11:00 - 11.05 CEST / 18:00 - 18.05 JST

Introduction

11:05 - 11.35 CEST / 18:05 - 18:35 JST

Part 1: Retrieval and Knowledge Graphs

Stuart James

11:35 - 12:10 CEST / 18:35 - 19:10 JST

Part 2: Content-based analysis

Mathieu Aubry

12:10 - 12:25 CEST / 19:10 - 19:25 JST

[Break]

12:25 - 13:00 CEST / 19:25 - 20:00 JST

Part 3: Multi-Task Learning

Nanne van Noord

13:00 - 13:35 CEST / 20:00 - 20.35 JST

Part 4: Automatic interpretation

Noa Garcia

13:35 - 13:50 CEST / 20:35 - 20:50 JST

[Break]

13:50 - 14:25 CEST / 20:50 - 21.25 JST

Part 5: Using Computer Vision within humanities research

Leonardo Impett

Closing Remarks

SPEAKERS

Stuart James

Mathieu Aubry

Nanne van Noord

Noa Garcia

Leonardo Impett

Sponsors

AVinDH SIG

Contact Us

Closing
Remarks