136010 UE Introduction to DH Tools and Methods (2021S)
Prüfungsimmanente Lehrveranstaltung
Labels
An/Abmeldung
Hinweis: Ihr Anmeldezeitpunkt innerhalb der Frist hat keine Auswirkungen auf die Platzvergabe (kein "first come, first served").
- Anmeldung von Mo 01.02.2021 09:00 bis Do 25.02.2021 23:59
- Abmeldung bis Mi 31.03.2021 23:59
Details
max. 25 Teilnehmer*innen
Sprache: Englisch
Lehrende
Termine (iCal) - nächster Termin ist mit N markiert
Topics/Dates
1) Overview on digital ecosystem for DH: OCR & NLP Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis, Sentiment Analysis, Temporal Series Analysis, SQL and NoSQL, Database Management, RDF triplestores, etc.2) The basics of Programming:IDEs and Digital Research Frameworks
Programming Languages: why Python?
First steps into programming
Hands on: Installing Anaconda and programming the first “Hello World”3) The basics of Versioning:
Versioning Code: Git, Github, Gitlab
Versioning Data: Dolthub
Hands on: Installing Github Desktop. Full cycle on versioning files.4) The basics of Python language I:
Python objects and methods
Python native structures: data containers
Hands on: small scripts with native structures.5) The basics of Python language II:
Python control flow
Python functions
Python scopes
Hands on: Mining text with simple functions.6) The basics of Python language III:
Python packages and modules
Python data persistence
Hands on: Exploring common packages from the standard library: os, time, sys, string, pickle, dill, random
Suggested readings:
Python generators and comprehensions7)Advanced topics of Python language:
Python classes and OOP
Hands on: Creating classes to represent DH entities
Suggested readings:
Python decorators
Code Optimization8) Topics in Exploratory Data Analysis (EDA) I
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn9) Topics in Exploratory Data Analysis (EDA) II
Basic Statistics
Data Visualization
Data Wrangling
Hands on: Python EDA with Numpy, Pandas, Matplotlib, Seaborn10) NLP Intro:
Motivations, Tasks, Goals and Challenges
NLP and the Humanities
Python NLP Packages (NLTK, Spacy, TextBlob, Wordnet)
Hands on: Python NLP Pipeline for Corpus Acquisition and Cleaning - Information extraction and Scraping, Regular Expressions, Frequency Analysis, N-grams, POS Tagging, Syntax parsing, NER, Summarization11) Topic Modeling,
Text Classification
Sentiment Analysis
Python NLP Packages (Gensim, PyLDAVis)
Hands on: Python NLP Pipelines for Text Representation (BoW, TfIDf)12) Word Embeddings
Dimensionality Reduction
Text Visualization
Hands on: Detection of Biases in Corpora13) Semantic Web Technologies
Knowledge Organization Systems
Thesauri and Ontologies
Hands on: Python and Text Annotation - TEI/XML (XML, LXML, XLPATH)14) Social Network Analysis
Graphs
Knowledge Graphs,
Hands on: SNA with Python Graph Structures (NetworkX, PyVIS)15) Where to go from here:
Temporal Series Analysis
Image analysis and Computer Vision
Machine Learning and Deep Learning for NLP
Text Classification
Text Clustering
Stylistic Analysis
Information Extraction
Information Retrieval Systems
- Freitag 05.03. 09:45 - 11:15 Digital
- Freitag 19.03. 09:45 - 11:15 Digital
- Freitag 26.03. 09:45 - 11:15 Digital
- Freitag 16.04. 09:45 - 11:15 Digital
- Freitag 23.04. 09:45 - 11:15 Digital
- Freitag 30.04. 09:45 - 11:15 Digital
- Freitag 07.05. 09:45 - 11:15 Digital
- Freitag 14.05. 09:45 - 11:15 Digital
- Freitag 21.05. 09:45 - 11:15 Digital
- Freitag 28.05. 09:45 - 11:15 Digital
- Freitag 04.06. 09:45 - 11:15 Digital
- Freitag 11.06. 09:45 - 11:15 Digital
- Freitag 18.06. 09:45 - 11:15 Digital
- Freitag 25.06. 09:45 - 11:15 Digital
Information
Ziele, Inhalte und Methode der Lehrveranstaltung
The course is aimed at providing students with the skills necessary to understand the sheer potential of the digital methods for the humanities, using the Python Programming Language for a handful of common tasks in the domain. The course will present a broad overview of methods and tools, specifically covering the following: OCR & Natural Language Processing (NLP) Pipelines, Visualization & Dashboards, Spatial Analysis, Image Analysis, Social Network Analysis (SNA), Sentiment Analysis, SQL and NoSQL Database Management. The course approach is both theoretical and practical, with an intense load of hands-on exercises. The students are expected to have familiarity with digital environments, and previous practice with programming is desired, but not mandatory.
Art der Leistungskontrolle und erlaubte Hilfsmittel
Course evaluation will be a combination of in-class participation (30%), weekly homework assignments (40%), and the final project (30%).
Mindestanforderungen und Beurteilungsmaßstab
Attendance is required; regular participation is the key to completing the course; all students must provide their computing environment; homework assignments must be submitted on time (some can be completed later as a part of the final project, but this must be discussed with the instructor whenever the issue arises); the final project must be submitted on time.
Prüfungsstoff
There is no examination for the course.
Literatur
Learning Python, 5th Edition by Mark Lutz, O'Reilly Media, 2013. ISBN 978-1-4493-5573-9.Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinny, O'Reilly Media, 2012. ISBN 978-1-4493-1979-3Github Repository - https://github.com/rsouza/Python_CourseProgramming historian → relevant courses
https://programminghistorian.org/en/lessons/TED Talk - https://www.ted.com/talks/reshma_saujani_teach_girls_bravery_not_perfection
https://programminghistorian.org/en/lessons/TED Talk - https://www.ted.com/talks/reshma_saujani_teach_girls_bravery_not_perfection
Zuordnung im Vorlesungsverzeichnis
DH-S I
Letzte Änderung: Do 04.07.2024 00:13