In this tutorial, we look at Computer Vision approaches developed to investigate Digital Humanities data and, more specifically, fine-art and cultural heritage. We will explain what approaches can achieve, how to train and use with a basic understanding of python to be applied to different types of visual data. By breaking down the tutorial into five parts (one per presenter), the tutorial will provide an overview of the research within Computer Vision and its current and future application within Digital Humanities. We additionally attempt to provide some reflections on the use of Asian data and the limitations or challenges. While considering the current extensive narrative within the Computer Vision research community on bias in datasets and collections.
11:00 - 11.05 CEST / 18:00 - 18.05 JST
Welcome remarks and tutorial structure
11:05 - 11.35 CEST / 18:05 - 18:35 JST
The use of Computer Vision for distant reading in image collections are generally within the setting of retrieval — searching via an example within a collection. To do this, a computational description of the image needs to generate to be then able to compare one image to another. How such a representation is learned is important to provide a powerful retrieval system. While using pre-trained approaches such as neural networks are useful, they fail to bridge the visual difference between photo-realistic images and common art or humanities image collections. In this part, we explore how anyone can train a neural network representation that is specific to their dataset with varying degrees of supervision, and specifically exploiting supervision that can be provided through Knowledge Graphs (or Semantic Web) to enhance the differential power of the representations.
11:35 - 12:10 CEST / 18:35 - 19:10 JST
Most Deep Learning image techniques rely on annotated collections. While these might be available for some well-studied types of documents, they cannot be expected for more specialized studies or sources. Instead, one would have to rely on techniques that do not require training data. This part will discuss several such techniques to establish links between artworks and historical documents, including the use of generic local features, synthetic data, self-supervised learning, and object discovery techniques. In addition, this will include examples of applications for repeated patterns discovery in artwork collections, fine artwork alignment, document images segmentation, historical watermarks recognition, scientific illustration propagation analysis, and unsupervised Optical Character Recognition. In all cases, it will be shown that standard approaches can give useful baseline results when tuned adequately, but that developing dedicated approaches that take into account the specificity of the data and the problem significantly improves the results.
12:10 - 12:25 CEST / 19:10 - 19:25 JST
12:25 - 13:00 CEST / 19:25 - 20:00 JST
Multi-Task Learning (MTL) is an increasingly prominent paradigm in Computer Vision and in Artificial Intelligence in general. It centers around the ability to perform multiple tasks based on a single input. For instance, it is possible to predict for a single image of an artwork when it was made by who and using what materials. Jointly performing these tasks involves specific modeling choices, resulting in clear benefits (robustness, improved performance), but it also has potential downsides (negative interference, increased complexity). In this part, we show when and how we might want to apply MTL, through a number of use cases, as well as an overview of the technical underpinnings. In addition, highlight the possibilities MTL provides for interpretability by shedding light on relations between tasks.
13:00 - 13:35 CEST / 20:00 - 20.35 JST
In Computer Vision, visual arts are often studied from an aesthetics perspective, mainly by analyzing the visual appearance of an art reproduction to infer its attributes (author, year of creation, theme, etc.), its representative elements (objects, people, locations, etc.), or to transfer the style across different images. However, understanding an artistic representation involves mastering complex comprehension processes, such as identifying the socio-political context of the artwork or recognizing the artist’s main influences. In this part, we will explore fine-art paintings from both a visual and a language perspective. The aim is to bridge the gap between the visual appearance of an artwork and its underlying meaning by jointly analyzing its aesthetics and semantics. We will explore multimodal techniques for interpreting artworks, and we will show how Computer Vision approaches can learn to automatically generate descriptions for fine-art paintings in natural language.
13:35 - 13:50 CEST / 20:35 - 20:50 JST
13:50 - 14:25 CEST / 20:50 - 21.25 JST
Most models in Computer Vision research are built to solve specific problems with measurable outcomes (often tied to a set of reference datasets): pixelwise segmentation, object detection, image captioning, keypoint detection, etc. With many open-source computer vision models for each kind of task, we have a wide horizon of powerful tools at our disposal: yet most of them don’t easily fit with research questions in art history, visual culture studies, or the visual humanities more generally. By dissecting a series of previous projects in this area, this part will look at how researchers have negotiated these connections, including complex and difficult questions of bias, interpretability, and the epistemology of computational results within the humanities (and especially within cultural history). We will look at several methodological modes compatible with the affordances of Computer Vision, including image replication, computational iconography, and the study of visual phenomena captured through notational systems.
Our SIG AVinDH is meant to be a venue for exchanging knowledge, expertise, methods and tools by scholars who make use of audiovisual data types that can convey a certain level of narrativity: spoken audio, video and/or (moving) images.
Contact Us