Programme

Here you find a current version of the program. There is also a CSV version for download.

Program

Time

Monday 6.7.

taught by

Tuesday 7.7.

taught by

Wednesday 8.7.

taught by

Thursday 9.7.

taught by

Friday 10.7.

taught by

9:00:00 am

Morning 1

Opening; Introduction of participants, introduction into data set in use during the school

Georg Vogeler / Stephan Kurz

Data modelling; from research question and source to data modell (ER); Factoid (IPIF)

Gioele Barabucci

Data Modelling III: standards XML: TEI, tei:person

Richard Hadden

Analysis I: Query (languages); XQuery/XPath, SQL intro

Georg Vogeler / Richard Hadden

ref:Visualization I: basics; plotting etc. <session-5.1>

Florian Windhager / Eva Mayer

10:00:00 am

Morning 2

Prosopography and sources: The Factoid approach

Michele Pasin

Data modelling II: ontologies, data models and standards; RDFS/OWL: BIO Vocab, Schema.org, Wikidata, Foaf

Gioele Barabucci

Data capture IV: NLP/NER

Matthias Schlögl / Renato Rocha Souza+

Analysis II: Query via SPARQL

Georg Vogeler

Visualization II: maps and GIS

Rainer Simon

11:00:00 am

Morning 3

Prosopography and Social Network Analysis

Marcella Tambuscio

Data modelling III: controlled vocabularies; SKOS; HISCO, Geonames, GND, Viaf, Getty

Georg Vogeler

Data capture V: NLP/word embeddings

Matthias Schlögl / Renato Rocha Souza

Analyisis III: reuse of existing datasets via REST and SPARQL-Endpoints

Matthias Schlögl

Visualisation III: maps and GIS

Rebecca Kahn

Lunch

2:00:00 pm

Afternoon 1

Data capture I: From original source to text via edition (tei:persName)

Richard Hadden

Data modelling IV: Historical information and foundational ontologies

Francesco Beretta

Project presentations

Participants

Analysis II: basics statistics, plotting, unsupervised learning

Sabine Laszakovits

3:00:00 pm

Afternoon 2

Data capture II: From original source to text via OCR/HTR

Stephan Kurz

Data modelling V: Exploring and extending the CIDOC CRM

Francesco Beretta

Project presentations

Participants

Analysis III: Networks, creation and basic analytics

Marcella Tambuscio

4:00:00 pm

Afternoon 3

Data capture III: annotation via regex and recogito

Gioele Barabucci / Rebecca Kahn / Rainer Simon /

You own database: Blazegraph, Mariadb, Researchspace, Wikibase; Mongodb; Wisski, PDR, Papilotte, APIS, Cosmotool

Georg Vogeler

Virtual poster session

Analysis IV: Networks extended, SpaceTime Cube

Florian Windhager / Eva Mayer

7:00:00 pm

Evening

Keynote: To be or not to be. Identifying names, persons and groups in unstructured data of the Dutch Golden Age

Charles van den Heuvel

Reception

Opening: Introduction of participants, introduction into data set in use during the school

Georg Vogeler, Stephan Kurz 6.7.2020 9-9:50

In the opening session Georg Vogeler will briefly introduce into the field of prosopography and the changes it has undergone since the digital turn. In addition to some organizational matters he will introduce the staff of the summer school and offer students the possibility to introduce themselves. Stephan Kurz will finally introduce into the dataset that will be used throughout the summer school.

Prosopography and sources: The Factoid approach

Michele Pasin 6.7.2020 10-10:50

Introduction to the Factoid model and the implementation at Kings College London. Slides

Prosopography and Social Network Analysis

Marcella Tambuscio 6.7.2020 11-11:50

Introduction to basic ideas of Social Network Analysis (SNA) Slides

Data capture I: From original source to text via edition (tei:persName)

Richard Hadden 6.7.2020 14-14:50

In this session, Richard Hadden will briefly focus on scholarly editions, before looking at the TEI encoding used by the Ministerial Protocols dataset.

Data capture II: From original source to text via OCR/HTR

Stephan Kurz 6.7.2020 15-15:50

The session will focus on data capture with the Transkribus HTR engine that the MRP edition project is using for upcoming TEI edition volumes. Example images to be used are available in the dataset/images folder and in a Transkribus collection (ready to be shared). Slides are in Data capture with Transkribus.

Data capture III: annotation via regex and Recogito

Regex: Gioele Barabucci 6.7.2020 16-16:30
Recogito: Rainer Simon, Rebecca Kahn 6.7.2020 16:40-17:10

Regular expressions (regexes) is a concise notation to express complex search patterns. Rooted in theoretical computer science, regular expressions are employed in hundreds of tools (e.g., LibreOffice, sed, oXygen) and languages (e.g. XSLT, XML Schema, Python, Ruby) to provide the users with a powerful pattern matching mechanism. slides

Recogito, developed and maintained by the digital humanities initiative Pelagios Commons, is a tool for collaboratively annotating source text, tabular data, and images. In this session, we will introduce the tool, focusing specifically on its capabilities for semantic annotation: annotating entities (places, people) and the relations between them. The tool is browser-based and requires no prior installation. For the hands-on exercises, participants should ideally create an account beforehand, by going to ACDH’s Recogito instance at https://recogito.acdh-dev.oeaw.ac.at/

Data Modelling I/II: Entity Relationship Diagrams, RDFS, OWL

Gioele Barabucci 7.7. 9:00-11:00

Data Modelling III: Controlled Vocabularies

Georg Vogeler 7.7.2020 11:00-12:00

People and the information about them occur in variant descriptions and in variant locations. To find the same person in different resources, to identify the place of birth, death, study etc. in a interoperable way, information science has developed “controlled vocabularies”. Reference lists of places are often labeled “gazetteers”, intellectual concepts run under “thesaurus”, and computer scientist name them sometimes “ontologies”. Many of them are published online, and every prosopographers is well advised to reuse as much of them as feasible. The session will present some of them (VIAF, SNAC, UAN, geonames), describe in examples their purpose, introduce main technical standards to store them, and encourage prosopographers to create their own.

Some links:

Exercises:

Slides

Data modelling IV: Historical information and foundational ontologies

Francesco Beretta 7.7.2020 14:00-15:00

Modelling prosopographical data and making them re-usable for new research agendas according to the FAIR principles requires an in-depth analysis on how to develop an interoperable conceptualization in the field of historical research. After introducing the symogih.org project’s patterns-based ontology approach, we’ll analyse the role that factual information plays in the process of historical knowledge production. We’ll then develop an epistemological and semantic analysis of conceptual data modelling based on the foundational ontologies Constructive Descriptions and Situations and DOLCE, and discuss the reasons for adopting the CIDOC CRM as a core ontology in the field of historical research.

Data modelling V: Exploring and extending the CIDOC CRM

Francesco Beretta 7.7.2020 15:00-16:00

The CIDOC CRM (an ISO norm since 2006) has been defined as a “formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information” (Martin Doerr). Although there are significant intersections between the two fields of cultural heritage preservation in museums and historical research, the CRM has to be extended with some relevant, missing high-level classes in order to meet the needs of the latter. Furthermore, there is a need to enrich the ontology with the modelling patterns required for the fine-grained data production in all the different fields of historical research, and notably in prosopography. We will see how collaborative data modelling carried out in the ontology management environment OntoME makes it possible to elaborate a communal and adaptive conceptualization of the domain.

Data Capture III: Your Own Database

Georg Vogeler 7.7.2020 16:00-17:00 slides

The session will introduce you to basic concepts of databases and the main database types (SQL, RDF Triple Store, graph database). The session will introduce some major database management software systems and give an insight in “ready made” offers for historical research. In the exercises we will set up a SQL database based on SQLite <http://sqlite.org>, access it via one possible front-end software (https://sqlitebrowser.org/, yes, please download and install …) and add some data. Finally we use a triple store https://summer2020.acdh-dev.oeaw.ac.at/ and add some data to it. We will query the data on Thursday. If you want to set up your own triple store I can recommend Apache Jena/Fuseki <https://jena.apache.org/>.

Data modelling VI: standards, XML: TEI, tei:person

Richard Hadden 8.7.20202 09:00-10:00

This session will look at the TEI text encoding guidelines and their potential use for encoding structured and semi-structured prosopographical data.

Data capture IV: NLP/NER

Matthias Schlögl, Renato Rocha Souza 8.7.2020 10:00-10:50

In the first of our two NLP sessions Renato will start with a quick overview of NLP methods. Subsequently we take a closer look at Named Entity Recognition (NER) by running through an interactive Python/Jupyter notebook.

Slides

NLP notebook

We will be using Google/Colab to run through interactive Python/Jupyter notebooks. To follow the notebooks on your own computer (and change cells on your own) please do the following:

  • Head over to GitHub and get yourself an account if you dont have one yet.

  • Go to https://github.com/acdh-oeaw/summerschool2020-notebooks and fork 1 the repo into your own account (There is a “fork” button in the upper right corner).

  • Go to Google/Colab and in the “open” dialog choose GitHub. Colab will ask you for authorization to access your GitHub Account.

  • Once you have granted access Colab will show you all your GitHub repos. Choose the one you just forked and in folder session_3-2_NLP open the 10am notebook.

Data capture V: NLP/word embeddings

Matthias Schlögl, Renato Rocha Souza 8.7.2020 11:00-11:50

In the second part of the NLP sessions we will have a look at more recent developments in the field of NLP. Especially Renato will show in another Colab notebook what you can do using a relatively new method called word embeddings.

NLP notebook

Analysis I/II: XPath, XQuery, SQL, SPARQL

Richard Hadden, Georg Vogeler 9.7.2020 9:00-11:00 Slides Slides Slides

The session will teach you how to query your data: XML with XPath and XQuery, relational databases with SQL and RDF data with the RDF query language SPARQL (http://www.w3.org/TR/sparql11-overview/).

For XPath / XQuery use the MPR XML data set and the Oxygen XML-Editor. For SQL use SQLite browser (https://sqlitebrowser.org/) or the sqlite command line interface and the SQL data set. For the SPARQL queries use the schools research space installation or your local Jena/Fuseki installation.

Analysis III: reuse of existing datasets via REST and SPARQL-Endpoints

Matthias Schlögl 9.7.2020 11:00-11:50

Session on reuse of existing SPARQL and Rest endpoints for enriching data.

Rest / SPARQL reuse notebook

Analysis IV: basics statistics, plotting, unsupervised learning

Sabine Laszakovits 9.7.2020 14:00-14:50

Session on basic statistics in Jupyter notebooks, including basic plotting.

Basic statistics notebook

Analysis III: Networks, creation and basic analytics

Marcella Tambuscio 9.7.2020 15:00-15:50

The hands on Social Network Analysis (SNA) session uses network data from the MPR dataset to show basic SNA analytics.

Graphml produced by session

R notebook used during session

R notebook to view

Analysis IV: Networks extended, SpaceTime Cube

Florian Windhager / Eva Mayer

This session will explore the evolving field of visualization methods, and will discuss the general strategy to utilize multiple views for the visual analysis of complex phenomena. With specific regard to bio- and prosopographical data, we will also argue for the development of novel ways and means of information integration, and introduce the PolyCube framework as one specific solution (https://danubevislab.github.io/polycube/cga2020/).

Slides

Visualization I: basics; plotting

Florian Windhager / Eva Mayer

The hands-on session will introduce a variety of open visualization tools, and showcase some specific visualization options with Palladio (http://hdlab.stanford.edu/palladio/) and Timeline Storyteller (https://timelinestoryteller.com/).

Slides

Visualisation II: maps and GIS

Rainer Simon 10.7.2020 10:00-11:00

First part of the maps visualization hands on session.

Slides for Vis II and III

Visualisation III: maps and GIS

Rebecca Kahn 10.7.2020 11:00-12:00

The second part of this session will explore the use of linked open geodata as a mechanism for connecting objects in cultural heritage collections, using Peripleo the linked data search engine created by the Pelagios Network. We will also explore how to use GIS and geodata to create multimedia data stories, using ArcGIS Story Maps a browser-based tool for creating and sharing map-based narratives. Participants who are interested in this part of the session should create a free account on the ArcGIS Story Maps page.

Slides for Vis II and III

1

A fork is basically a copy of the repo in your own account