Organiser: Prof. Dr. Timm Teubner (TU Berlin)
In this beginner’s guide to web crawling, we will cover the basics of how to automatically extract information from static and dynamic websites. The course will include a fair share of “hands on” work, in which we will write and run code ourselves (Java). If you plan to code along (highly recommended), please have a Java IDE ready for the workshop (e.g. using Eclipse). Beyond the basic principles of how to access websites, we will learn how to navigate the retrieved HTML code (JSoup), how to deal with dynamic and interactive pages (Selenium), and consider some important legal aspects.