Veranstaltungskalender

Kurse, Seminare, Führungen und sonstige Veranstaltungen der KIT-Bibliothek
VeranstaltungskalenderKIT
 
Online Seminar

Introduction to Text Mining with the Natural Language Toolkit (NLTK) (TU9-Seminar)

Montag, 16. Mai 2022, 13:30-15:00
Online

Hinweis: Diese Veranstaltung wird angeboten und durchgeführt von der Universitäts- und Landesbibliothek Darmstadt.
Im Rahmen einer Kooperation der
Allianz führender Technischer Universitäten in Deutschland (TU9) steht eine begrenzte Anzahl von Plätzen auch Interessierten aus TU9-Partnereinrichtungen offen. Anmeldung unter dem unten genannten Link.

 

Workshop for participants with no prior knowledge of Python and text mining.

Text mining methods are used to automatically extract structured information from large amounts of texts. The workshop provides a first, practical introduction to the topic. Together we will analyze the abstracts of scientific articles. As a tool we will use the Python library Natural Language Toolkit to tokenize the texts, remove stop words and finally generate visualizations of the words which are characteristic for these abstracts. As development environment, we will use the open source software Jupyter Notebook, popular in the data science field, to run our software code and display its results.

 

  • How do I use a Jupyter notebook to run Python code while documenting my workflow at the same time in a meaningful way?
  • Where can I find suitable scientific text material that I can analyze automatically?
  • How do I extract the contents of a specific column from a csv file for subsequent analysis?
  • How do I use the Python library Natural Language Toolkit (NLTK) to prepare texts for a text mining analysis?
  • How do I determine word frequencies with the NLTK and then visualize them in the form of a diagram or word cloud?

 

You will get answers to these questions in the workshop and can apply your new knowledge directly on practical examples. After the workshop, you can use your self-created Jupyter notebook to repeat the analyses on your own text documents.
Please install the Python distribution Anaconda (https://www.anaconda.com/products/individual) on your computer before the workshop starts. Anaconda serves as a platform for managing the required Python libraries nltk, numpy, matplotlib, and wordcloud, as well as the Jupyter notebook software. Instructions can be found here: https://hessenbox.tu-darmstadt.de/getlink/fiCPbdzbLkfZMYZdMdAKqZ3P/Installationsanleitung_Freigabe. The installation guide is available in PDF format (German and English) and as a video file (German), which all contain identical information. Additional documents will be sent to you before the workshop in a separate message. If you have any questions, please feel free to contact tdm does-not-exist.ulb tu-darmstadt de at any time.

 

This seminar is conducted via Zoom. Registration until 24h before the start of the workshop via Cituro.

 

 

Tag(s): BIB-S, BIB-N, Data Literacy, Text Mining, TU9

 

Referent/in
Andre Pfeifer, Jens Freund

Universitäts- und Landesbibliothek Darmstadt
Veranstalter
KIT-Bibliothek
Straße am Forum 2
76131 Karlsruhe
Tel: +49 721 608-43109 / -43111
E-Mail: infokompetenz does-not-exist.bibliothek kit edu
https://www.bibliothek.kit.edu
Servicemenü