Programme¶
Here you find a current version of the program. There is also a CSV version for download
.
Time |
Monday 6.7. |
taught by |
Tuesday 7.7. |
taught by |
Wednesday 8.7. |
taught by |
Thursday 9.7. |
taught by |
Friday 10.7. |
taught by |
|
---|---|---|---|---|---|---|---|---|---|---|---|
9:00:00 am |
Morning 1 |
Opening; Introduction of participants, introduction into data set in use during the school |
Data modelling; from research question and source to data modell (ER); Factoid (IPIF) |
Analysis I: Query (languages); XQuery/XPath, SQL intro |
Georg Vogeler / Richard Hadden |
ref:Visualization I: basics; plotting etc. <session-5.1> |
|||||
10:00:00 am |
Morning 2 |
Data modelling II: ontologies, data models and standards; RDFS/OWL: BIO Vocab, Schema.org, Wikidata, Foaf |
Matthias Schlögl / Renato Rocha Souza+ |
Analysis II: Query via SPARQL |
Georg Vogeler |
Visualization II: maps and GIS |
Rainer Simon |
||||
11:00:00 am |
Morning 3 |
Data modelling III: controlled vocabularies; SKOS; HISCO, Geonames, GND, Viaf, Getty |
Matthias Schlögl / Renato Rocha Souza |
Analyisis III: reuse of existing datasets via REST and SPARQL-Endpoints |
Matthias Schlögl |
||||||
Lunch |
|||||||||||
2:00:00 pm |
Afternoon 1 |
Data capture I: From original source to text via edition (tei:persName) |
Data modelling IV: Historical information and foundational ontologies |
Project presentations |
Participants |
Analysis II: basics statistics, plotting, unsupervised learning |
Sabine Laszakovits |
||||
3:00:00 pm |
Afternoon 2 |
Project presentations |
Participants |
||||||||
4:00:00 pm |
Afternoon 3 |
Virtual poster session |
|||||||||
7:00:00 pm |
Evening |
||||||||||
Reception |
Opening: Introduction of participants, introduction into data set in use during the school¶
Georg Vogeler, Stephan Kurz 6.7.2020 9-9:50
In the opening session Georg Vogeler will briefly introduce into the field of prosopography and the changes it has undergone since the digital turn. In addition to some organizational matters he will introduce the staff of the summer school and offer students the possibility to introduce themselves. Stephan Kurz will finally introduce into the dataset that will be used throughout the summer school.
Prosopography and sources: The Factoid approach¶
Michele Pasin 6.7.2020 10-10:50
Introduction to the Factoid model and the implementation at Kings College London.
Slides
Prosopography and Social Network Analysis¶
Marcella Tambuscio 6.7.2020 11-11:50
Introduction to basic ideas of Social Network Analysis (SNA)
Slides
Data capture I: From original source to text via edition (tei:persName)¶
Richard Hadden 6.7.2020 14-14:50
In this session, Richard Hadden will briefly focus on scholarly editions, before looking at the TEI encoding used by the Ministerial Protocols dataset.
Data capture II: From original source to text via OCR/HTR¶
Stephan Kurz 6.7.2020 15-15:50
The session will focus on data capture with the Transkribus HTR engine that the MRP edition project is using for upcoming TEI edition volumes. Example images to be used are available in the dataset/images folder and in a Transkribus collection (ready to be shared). Slides are in Data capture with Transkribus.
Data capture III: annotation via regex and Recogito¶
Regular expressions (regexes) is a concise notation to express complex search
patterns. Rooted in theoretical computer science, regular expressions are
employed in hundreds of tools (e.g., LibreOffice, sed, oXygen) and languages
(e.g. XSLT, XML Schema, Python, Ruby) to provide the users with a powerful
pattern matching mechanism. slides
Recogito, developed and maintained by the digital humanities initiative Pelagios Commons, is a tool for collaboratively annotating source text, tabular data, and images. In this session, we will introduce the tool, focusing specifically on its capabilities for semantic annotation: annotating entities (places, people) and the relations between them. The tool is browser-based and requires no prior installation. For the hands-on exercises, participants should ideally create an account beforehand, by going to ACDH’s Recogito instance at https://recogito.acdh-dev.oeaw.ac.at/
Data Modelling I/II: Entity Relationship Diagrams, RDFS, OWL¶
Gioele Barabucci 7.7. 9:00-11:00
Data Modelling III: Controlled Vocabularies¶
Georg Vogeler 7.7.2020 11:00-12:00
People and the information about them occur in variant descriptions and in variant locations. To find the same person in different resources, to identify the place of birth, death, study etc. in a interoperable way, information science has developed “controlled vocabularies”. Reference lists of places are often labeled “gazetteers”, intellectual concepts run under “thesaurus”, and computer scientist name them sometimes “ontologies”. Many of them are published online, and every prosopographers is well advised to reuse as much of them as feasible. The session will present some of them (VIAF, SNAC, UAN, geonames), describe in examples their purpose, introduce main technical standards to store them, and encourage prosopographers to create their own.
Some links:
Exercises:
Describe the data from https://en.wikipedia.org/wiki/M%C3%B3ric_Esterh%C3%A1zy with properties from bio-vocab.
Describe your own data with properties from http://schema.org
Describe the data found at https://en.wikipedia.org/wiki/M%C3%B3ric_Esterh%C3%A1zy in the IPIF model (don‘t care to much about syntax, but JSON is supported by Oxygen, for instance)
How many authority file entries can be traced back to http://www.idref.fr/029638437/id?
Describe the biographical information found in https://en.wikipedia.org/wiki/M%C3%B3ric_Esterh%C3%A1zy with controlled vocabularies.
Discuss: Which parts of your data should be exposed as Linked Data? And which would be the ID to be used?
Data modelling IV: Historical information and foundational ontologies¶
Francesco Beretta 7.7.2020 14:00-15:00
Modelling prosopographical data and making them re-usable for new research agendas according to the FAIR principles requires an in-depth analysis on how to develop an interoperable conceptualization in the field of historical research. After introducing the symogih.org project’s patterns-based ontology approach, we’ll analyse the role that factual information plays in the process of historical knowledge production. We’ll then develop an epistemological and semantic analysis of conceptual data modelling based on the foundational ontologies Constructive Descriptions and Situations and DOLCE, and discuss the reasons for adopting the CIDOC CRM as a core ontology in the field of historical research.
Data modelling V: Exploring and extending the CIDOC CRM¶
Francesco Beretta 7.7.2020 15:00-16:00
The CIDOC CRM (an ISO norm since 2006) has been defined as a “formal ontology intended to facilitate the integration, mediation and interchange of heterogeneous cultural heritage information” (Martin Doerr). Although there are significant intersections between the two fields of cultural heritage preservation in museums and historical research, the CRM has to be extended with some relevant, missing high-level classes in order to meet the needs of the latter. Furthermore, there is a need to enrich the ontology with the modelling patterns required for the fine-grained data production in all the different fields of historical research, and notably in prosopography. We will see how collaborative data modelling carried out in the ontology management environment OntoME makes it possible to elaborate a communal and adaptive conceptualization of the domain.
Data Capture III: Your Own Database¶
Georg Vogeler 7.7.2020 16:00-17:00
slides
The session will introduce you to basic concepts of databases and the main database types (SQL, RDF Triple Store, graph database). The session will introduce some major database management software systems and give an insight in “ready made” offers for historical research. In the exercises we will set up a SQL database based on SQLite <http://sqlite.org>, access it via one possible front-end software (https://sqlitebrowser.org/, yes, please download and install …) and add some data. Finally we use a triple store https://summer2020.acdh-dev.oeaw.ac.at/ and add some data to it. We will query the data on Thursday. If you want to set up your own triple store I can recommend Apache Jena/Fuseki <https://jena.apache.org/>.
Data modelling VI: standards, XML: TEI, tei:person¶
Richard Hadden 8.7.20202 09:00-10:00
This session will look at the TEI text encoding guidelines and their potential use for encoding structured and semi-structured prosopographical data.
Data capture IV: NLP/NER¶
Matthias Schlögl, Renato Rocha Souza 8.7.2020 10:00-10:50
In the first of our two NLP sessions Renato will start with a quick overview of NLP methods. Subsequently we take a closer look at Named Entity Recognition (NER) by running through an interactive Python/Jupyter notebook.
We will be using Google/Colab to run through interactive Python/Jupyter notebooks. To follow the notebooks on your own computer (and change cells on your own) please do the following:
Head over to GitHub and get yourself an account if you dont have one yet.
Go to https://github.com/acdh-oeaw/summerschool2020-notebooks and fork 1 the repo into your own account (There is a “fork” button in the upper right corner).
Go to Google/Colab and in the “open” dialog choose GitHub. Colab will ask you for authorization to access your GitHub Account.
Once you have granted access Colab will show you all your GitHub repos. Choose the one you just forked and in folder session_3-2_NLP open the 10am notebook.
Data capture V: NLP/word embeddings¶
Matthias Schlögl, Renato Rocha Souza 8.7.2020 11:00-11:50
In the second part of the NLP sessions we will have a look at more recent developments in the field of NLP. Especially Renato will show in another Colab notebook what you can do using a relatively new method called word embeddings.
Analysis I/II: XPath, XQuery, SQL, SPARQL¶
Richard Hadden, Georg Vogeler 9.7.2020 9:00-11:00
Slides
Slides
Slides
The session will teach you how to query your data: XML with XPath and XQuery, relational databases with SQL and RDF data with the RDF query language SPARQL (http://www.w3.org/TR/sparql11-overview/).
For XPath / XQuery use the MPR XML data set and the Oxygen XML-Editor. For SQL use SQLite browser (https://sqlitebrowser.org/) or the sqlite command line interface and the SQL data set. For the SPARQL queries use the schools research space installation or your local Jena/Fuseki installation.
Analysis III: reuse of existing datasets via REST and SPARQL-Endpoints¶
Matthias Schlögl 9.7.2020 11:00-11:50
Session on reuse of existing SPARQL and Rest endpoints for enriching data.
Analysis IV: basics statistics, plotting, unsupervised learning¶
Sabine Laszakovits 9.7.2020 14:00-14:50
Session on basic statistics in Jupyter notebooks, including basic plotting.
Analysis III: Networks, creation and basic analytics¶
Marcella Tambuscio 9.7.2020 15:00-15:50
The hands on Social Network Analysis (SNA) session uses network data from the MPR dataset
to show basic SNA analytics.
Analysis IV: Networks extended, SpaceTime Cube¶
Florian Windhager / Eva Mayer
This session will explore the evolving field of visualization methods, and will discuss the general strategy to utilize multiple views for the visual analysis of complex phenomena. With specific regard to bio- and prosopographical data, we will also argue for the development of novel ways and means of information integration, and introduce the PolyCube framework as one specific solution (https://danubevislab.github.io/polycube/cga2020/).
Visualization I: basics; plotting¶
Florian Windhager / Eva Mayer
The hands-on session will introduce a variety of open visualization tools, and showcase some specific visualization options with Palladio (http://hdlab.stanford.edu/palladio/) and Timeline Storyteller (https://timelinestoryteller.com/).
Visualisation II: maps and GIS¶
Rainer Simon 10.7.2020 10:00-11:00
First part of the maps visualization hands on session.
Visualisation III: maps and GIS¶
Rebecca Kahn 10.7.2020 11:00-12:00
The second part of this session will explore the use of linked open geodata as a mechanism for connecting objects in cultural heritage collections, using Peripleo the linked data search engine created by the Pelagios Network. We will also explore how to use GIS and geodata to create multimedia data stories, using ArcGIS Story Maps a browser-based tool for creating and sharing map-based narratives. Participants who are interested in this part of the session should create a free account on the ArcGIS Story Maps page.
- 1
A fork is basically a copy of the repo in your own account