Golden Nuggets at Berlin Buzzwords 2014
If you haven’t heard of Berlin Buzzwords, it’s the conference covering all the latest buzzwords surrounding the biggest buzzword of them all: Big Data. And boy is it buzzwordy! I’m …
Artificial Intelligence, Large Language and Foundation Models, Knowledge Graphs
I work at SAP as CTO of the AI Unit, leading a group developing Generative AI and Foundation Models on linked business data. Before that, I was leading a research group on Natural Language Processing and Knowledge Graphs at the Goldman Sachs AI research group. In the past, I co-founded two startups (Ambiverse and nudge:nudge) with the goal of creating machines that organize knowledge with AI methods. I received my PhD from the Max Planck Institute for Informatics, on the topic of discovering and disambiguating named entities in text.
Have a look at my publications and projects.
My research interest is in Generative AI and Foundation Models on multiple modalities, with a focus on business data.
@article{Kampik:2024aa,
abstract = {The continued success of Large Language Models (LLMs) and other generative artificial intelligence approaches highlights the advantages that large information corpora can have over rigidly defined symbolic models, but also serves as a proof-point of the challenges that purely statistics-based approaches have in terms of safety and trustworthiness. As a framework for contextualizing the potential, as well as the limitations of LLMs and other foundation model-based technologies, we propose the concept of a Large Process Model (LPM) that combines the correlation power of LLMs with the analytical precision and reliability of knowledge-based systems and automated reasoning approaches. LPMs are envisioned to directly utilize the wealth of process management experience that experts have accumulated, as well as process performance data of organizations with diverse characteristics, e.g., regarding size, region, or industry. In this vision, the proposed LPM would enable organizations to receive context-specific (tailored) process and other business models, analytical deep-dives, and improvement recommendations. As such, it would allow to substantially decrease the time and effort required for business transformation, while also allowing for deeper, more impactful, and more actionable insights than previously possible. We argue that implementing an LPM is feasible, but also highlight limitations and research challenges that need to be solved to implement particular aspects of the LPM vision.},
author = {Kampik, Timotheus and Warmuth, Christian and Rebmann, Adrian and Agam, Ron and Egger, Lukas N. P. and Gerber, Andreas and Hoffart, Johannes and Kolk, Jonas and Herzig, Philipp and Decker, Gero and van der Aa, Han and Polyvyanyy, Artem and Rinderle-Ma, Stefanie and Weber, Ingo and Weidlich, Matthias},
date = {2024/07/26},
date-added = {2024-09-08 14:58:27 +0200},
date-modified = {2024-09-08 14:58:27 +0200},
doi = {10.1007/s13218-024-00863-8},
id = {Kampik2024},
isbn = {1610-1987},
journal = {KI - K{\"u}nstliche Intelligenz},
title = {Large Process Models: A Vision for Business Process Management in the Age of Generative AI},
url = {https://doi.org/10.1007/s13218-024-00863-8},
year = {2024},
bdsk-url-1 = {https://doi.org/10.1007/s13218-024-00863-8}}
@inproceedings{Bastos:2023,
author = {Bastos, Anson and Singh, Kuldeep and Nadgeri, Abhishek and Hoffart, Johannes and Singh, Manish and Suzumura Toyotaro},
booktitle = {Proceedings of the ACM Web Conference 2023},
date-added = {2024-01-13 21:42:59 +0100},
date-modified = {2024-01-13 21:45:06 +0100},
pages = {2455--2466},
title = {Can Persistent Homology provide an efficient alternative for Evaluation of Knowledge Graph Completion Methods?},
year = {2023}}
@inproceedings{Nadgeri:2021,
author = {Nadgeri, Abhishek and Bastos, Anson and Singh, Kuldeep and Mulang, Isaiah Onando and Hoffart, Johannes and Shekarpour, Saedeh and Saraswat, Vijay},
booktitle = {Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021},
date-added = {2021-11-11 17:17:00 +0100},
date-modified = {2021-11-11 17:18:09 +0100},
pages = {535--548},
title = {{KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction}},
year = {2021}}
@inproceedings{Bastos:2021b,
author = {Bastos, Anson and Singh, Kuldeep and Nadgeri, Abhishek and Shekarpour, Saedeh and Mulang, Isaiah Onando and Hoffart, Johannes},
booktitle = {Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM 2021},
date-added = {2021-11-11 17:15:03 +0100},
date-modified = {2021-11-11 17:16:10 +0100},
pages = {89--99},
title = {{HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations}},
year = {2021}}
@inproceedings{Gupta:2021,
author = {Gupta, Harsh and Del Corro, Luciano and Broscheit, Samuel and Hoffart, Johannes and Brenner, Eliot},
booktitle = {Proceeedings of the 2021 Conference on Empirical Methods in Natural Language, Punta Cana, Dominican Republic, 2021},
date-added = {2021-11-11 17:13:42 +0100},
date-modified = {2021-11-11 17:16:37 +0100},
pages = {8647--8652},
title = {{Unsupervised Multi-View Post-OCR Error Correction With Language Models}},
year = {2021}}
@inproceedings{Prabhakar:2021,
author = {Ravi, Manoj Prabhakar Kannan and Singh, Kuldeep and Mulang, Isaiah Onando and Shekarpour, Saedeh and Hoffart, Johannes and Lehmann, Jens},
booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2021},
date-added = {2021-04-24 22:27:17 +0200},
date-modified = {2021-05-29 16:26:12 +0200},
pages = {504-514},
title = {{CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata}},
year = {2021}}
@inproceedings{Bastos:2021,
author = {Bastos, Anson and Nadgeri, Abhishek and Singh, Kuldeep and Mulang, Isaiah Onando and Shekarpour, Saedeh and Hoffart, Johannes},
booktitle = {Proceedings of the 30th The Web Conference, WWW 2021},
date-added = {2021-04-24 22:21:23 +0200},
date-modified = {2021-04-24 22:33:35 +0200},
title = {{RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network}},
year = {2021}}
@inproceedings{DelCorro:2020jb,
author = {Del Corro, Luciano and Hoffart, Johannes},
booktitle = {Economics and Natural Language Processing (ECONLP), Workshop at the 2021 Conference on Empirical Methods in Natural Language, Punta Cana, Dominican Republic, 2021},
date-modified = {2021-11-11 17:13:28 +0100},
title = {{From Stock Prediction to Financial Relevance: Repurposing Attention Weights to Assess News Relevance Without Manual Annotations}},
year = {2021}}
@inproceedings{Mulang:2020jb,
author = {Mulang, Isaiah Onando and Singh, Kuldeep and Prabhu, Chaitali and Nadgeri, Abhishek and Hoffart, Johannes and Lehmann, Jens},
booktitle = {Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM 2020},
date-modified = {2021-04-24 22:32:26 +0200},
pages = {2157--2160},
title = {{Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models}},
year = {2020}}
@article{Weikum:2019is,
author = {Weikum, Gerhard and Hoffart, Johannes and Suchanek, Fabian M},
journal = {Computing and Software Science},
number = {1},
pages = {217--235},
title = {{Knowledge Harvesting: Achievements and Challenges}},
volume = {10000},
year = {2019}}
@techreport{Balada:2018vn,
author = {Balada, Christoph and Bellanova, Alexandra and Bruss, Michael and Buchberger, Stefan and Cirullies, Jan and Del Corro, Luciano and Festag, Reinhard and Fuhs, Gregor and Goetze, Stephan and Gressling, Thorsten and Havemann, Maike and Hoffart, Johannes and Holtel, Stefan and Hufenstuhl, Andreas and Kraus, Wener and Pfleger, Norbert and Pikus, Yevgen and Altran, Robin and Plumbaum, Till and Rolletschek, Gerhard and Satow, Lars and Schnakenburg, Igor and Siepmann, Ralph and Steffner, Rupert and Shozo, Moritz Takaya and Weber, Matthias and Wieczorek, Sebastian and Wittenburg, Georg},
title = {{Digitalisierung gestalten mit dem Periodensystem der K{\"u}nstlichen Intelligenz}},
year = {2018}}
@inproceedings{Agarwal:2018tb,
author = {Agarwal, Prabal and Str{\"o}tgen, Jannik and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, ACL 2018},
pages = {686--693},
title = {{diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora}},
year = {2018}}
@inproceedings{Seyler:2018vc,
author = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard},
booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia},
pages = {241--246},
title = {{A Study of the Importance of External Knowledge in the Named Entity Recognition Task}},
year = {2018}}
@article{Seyler:2017ww,
author = {Seyler, Dominic and Dembelova, Tatiana and Del Corro, Luciano and Hoffart, Johannes and Weikum, Gerhard},
journal = {CoRR},
title = {{KnowNER - Incremental Multilingual Knowledge in Named Entity Recognition}},
volume = {cs.CL},
year = {2017}}
@inproceedings{Rebele:2016vx,
address = {Cham},
author = {Rebele, Thomas and Suchanek, Fabian M and Hoffart, Johannes and Biega, Joanna and Kuzey, Erdal and Weikum, Gerhard},
booktitle = {Proceedings of the 15th International Semantic Web Conference, ISWC 2016},
pages = {177--185},
publisher = {Springer International Publishing},
title = {{YAGO - A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames.}},
year = {2016}}
@inproceedings{Hoffart:2016bp,
author = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard and Anand, Avishek and Singh, Jaspreet},
booktitle = {Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016, Montreal, Canada},
month = apr,
pages = {203--206},
publisher = {International World Wide Web Conferences Steering Committee},
title = {{The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities}},
year = {2016}}
@article{Weikum:2016vn,
author = {Weikum, Gerhard and Hoffart, Johannes and Suchanek, Fabian},
journal = {IEEE Data Eng. Bull.},
number = {3},
pages = {41--50},
title = {{Ten Years of Knowledge Harvesting: Lessons and Challenges}},
volume = {39},
year = {2016}}
@inproceedings{Schmidt:2016kr,
author = {Schmidt, Andreas and Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard},
booktitle = {Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - Systems Demonstrations},
pages = {1097--1100},
publisher = {ACM Press},
title = {{Context-Sensitive Auto-Completion for Searching with Entities and Categories}},
year = {2016}}
@inproceedings{Ernst:uv,
author = {Ernst, Patrick and Siu, Amy and Milchevski, Dragan and Hoffart, Johannes and Weikum, Gerhard},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics - System Demonstrations},
pages = {19--24},
title = {{DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences}},
year = {2016}}
@inproceedings{Singh:2016du,
author = {Singh, Jaspreet and Hoffart, Johannes and Anand, Avishek},
booktitle = {Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, USA},
pages = {1331--1340},
publisher = {ACM Press},
title = {{Discovering Entities with Just a Little Help from You}},
year = {2016}}
@phdthesis{Hoffart:2015wk,
author = {Hoffart, Johannes},
month = feb,
title = {{Discovering and Disambiguating Named Entities in Text}},
year = {2015}}
@inproceedings{Hoffart:2015dr,
author = {Hoffart, Johannes and Preda, Nicoleta and Suchanek, Fabian M and Weikum, Gerhard},
booktitle = {Tutorial at WWW 2015, Florence, Italy},
pages = {1--1},
title = {{Knowledge Bases for Web Content Analytics}},
year = {2015}}
@inproceedings{Hoffart:2014dt,
author = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard},
booktitle = {The 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia},
pages = {1247--1248},
title = {{STICS: Searching with Strings, Things, and Cats}},
year = {2014}}
@inproceedings{Hoffart:2014cy,
author = {Hoffart, Johannes and Milchevski, Dragan and Weikum, Gerhard},
booktitle = {Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China},
title = {{AESTHETICS: Analytics with Strings, Things, and Cats}},
year = {2014}}
@inproceedings{Nguyen:2014wl,
author = {Nguyen, Dat Ba and Hoffart, Johannes and Theobald, Martin and Weikum, Gerhard},
booktitle = {Linked Data on the Web at WWW2014},
title = {{AIDA-light: High-Throughput Named-Entity Disambiguation}},
year = {2014}}
@inproceedings{Hoffart:2014hp,
author = {Hoffart, Johannes and Altun, Yasemin and Weikum, Gerhard},
booktitle = {Proceedings of the 23rd international conference on World wide web, WWW 2014, Seoul, South Korea},
pages = {385--396},
title = {{Discovering emerging entities with ambiguous names}},
year = {2014}}
@inproceedings{Seufert:2013tx,
author = {Seufert, Stephan and Bedathur, Srikanta J and Hoffart, Johannes and Gubichev, Andrey and Berberich, Klaus},
booktitle = {Posters and Demonstrations Track of the 12th International Semantic Web Conference, ISWC 2013, Sydney, Australia},
pages = {1--4},
title = {{Efficient Computation of Relationship-Centrality in Large Entity-Relationship Graphs}},
year = {2013}}
@inproceedings{Wang:2013wx,
author = {Wang, Yafang and Jian, Lili and Hoffart, Johannes and Weikum, Gerhard},
booktitle = {SIGIR 2013, Dublin, Ireland},
title = {{YaLi: a Crowdsourcing Plug-In for NERD}},
year = {2013}}
@inproceedings{Yosef:2013vb,
author = {Yosef, Mohamed Amir and Bauer, Sandro and Hoffart, Johannes and Spaniol, Marc and Weikum, Gerhard},
booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria},
pages = {133--138},
title = {{HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text}},
year = {2013}}
@inproceedings{Jiang:2013tw,
author = {Jiang, Lili and Wang, Yafang and Hoffart, Johannes and Weikum, Gerhard},
booktitle = {CrowdSem Workshop at the 12th International Semantic Web Conference, ISWC 2013, Sydney, Australia},
title = {{Crowdsourced Entity Markup}},
year = {2013}}
@inproceedings{Hoffart:2013wk,
author = {Hoffart, Johannes},
booktitle = {PhD Symposion at ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York City, USA},
pages = {43--48},
title = {{Discovering and Disambiguating Named Entities in Text}},
year = {2013}}
@inproceedings{Suchanek:2013vd,
author = {Suchanek, Fabian M and Hoffart, Johannes and Kuzey, Erdal and Lewis-Kelham, Edwin},
booktitle = {15. GI-Fachtagung Datenbanksysteme f{\"u}r Business, Technologie und Web},
title = {{YAGO2s: Modular High-Quality Information Extraction with an Application to Flight Planning}},
year = {2013}}
@inproceedings{Hoffart:2013ww,
author = {Hoffart, Johannes and Suchanek, Fabian M and Berberich, Klaus and Weikum, Gerhard},
booktitle = {23rd International Joint Conference on Artificial Intelligence, IJCAI 2013, Beijing, China},
pages = {3161--3165},
title = {{YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract}},
year = {2013}}
@article{Hoffart:2013hn,
author = {Hoffart, Johannes and Suchanek, Fabian M and Berberich, Klaus and Weikum, Gerhard},
journal = {Artificial Intelligence},
month = jan,
pages = {28--61},
title = {{YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia}},
volume = {194},
year = {2013}}
@inproceedings{Yosef:2012tz,
author = {Yosef, Mohamed Amir and Bauer, Sandro and Hoffart, Johannes and Spaniol, Marc and Weikum, Gerhard},
booktitle = {Proceedings of the 24th International Conference on Computational Linguistics, Coling 2012, Mumbai, India},
pages = {1361--1370},
title = {{HYENA: Hierarchical Type Classification for Entity Names}},
year = {2012}}
@inproceedings{Hoffart:2012vx,
author = {Hoffart, Johannes and Seufert, Stephan and Nguyen, Dat Ba and Theobald, Martin and Weikum, Gerhard},
booktitle = {Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Hawaii, USA},
pages = {545--554},
title = {{KORE: Keyphrase Overlap Relatedness for Entity Disambiguation}},
year = {2012}}
@article{Weikum:2012wb,
author = {Weikum, Gerhard and Hoffart, Johannes and Nakashole, Ndapandula and Spaniol, Marc and Suchanek, Fabian and Yosef, Mohamed Amir},
journal = {IEEE Data Eng. Bull.},
pages = {46--55},
title = {{Big Data Methods for Computational Linguistics}},
volume = {35},
year = {2012}}
@inproceedings{Hoffart:2011a,
author = {Hoffart, Johannes and Yosef, Mohamed Amir and Bordino, Ilaria and F{\"u}rstenau, Hagen and Pinkal, Manfred and Spaniol, Marc and Taneva, Bilyana and Thater, Stefan and Weikum, Gerhard},
booktitle = {Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, Scotland},
pages = {782--792},
title = {{Robust Disambiguation of Named Entities in Text}},
year = {2011}}
@inproceedings{Hoffart:2011,
author = {Hoffart, Johannes and Suchanek, Fabian M and Berberich, Klaus and Lewis-Kelham, Edwin and de Melo, Gerard and Weikum, Gerhard},
booktitle = {Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, Hyderabad, India},
pages = {229--232},
publisher = {ACM},
title = {{YAGO2: Exploring and Querying World Knowledge in Space, Context, and Many Languages}},
year = {2011}}
@inproceedings{Yosef:2011,
author = {Yosef, Mohamed Amir and Hoffart, Johannes and Bordino, Ilaria and Spaniol, Marc and Weikum, Gerhard},
booktitle = {Proceedings of the 37th International Conference on Very Large Databases, VLDB 2011, Seattle, WA, USA},
pages = {1450--1453},
title = {{AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables}},
year = {2011}}
@techreport{Hoffart:2010,
address = {Saarbr{\"u}cken, Germany},
author = {Hoffart, Johannes and Suchanek, Fabian M and Berberich, Klaus and Weikum, Gerhard},
month = nov,
title = {{YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia}},
year = {2010}}
@inproceedings{Hoffart:2009,
author = {Hoffart, Johannes and Zesch, Torsten and Gurevych, Iryna},
booktitle = {Proceedings of the 5th International Symposium on Wikis and Open Collaboration, WikiSym 2009, Orlando, FL, USA},
month = oct,
publisher = {ACM},
title = {{An Architecture to Support Intelligent User Interfaces for Wikis by Means of Natural Language Processing}},
year = {2009}}
@inproceedings{Hoffart:2009a,
author = {Hoffart, Johannes and B{\"a}r, Daniel and Zesch, Torsten and Gurevych, Iryna},
booktitle = {INEX 2009 Workshop Preproceedings, 2009, Brisbane, Australia},
pages = {314--325},
title = {{Discovering Links Using Semantic Relatedness}},
year = {2009}}
AIDA is a framework and online tool for entity detection and disambiguation. Given a natural-language text, for example news articles, it maps mentions of ambiguous names onto canonical entities (e.g., individual people or places) registered in the YAGO knowledge base. A description and demo is available on the AIDA website. AIDA is now part of the AmbiverseNLU suite.
YAGO is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. Currently, YAGO has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities. All the data as well as several interfaces to browse and query the data are available on the YAGO website.
STICS is an entity-centric search engine that makes use of AIDA and YAGO. By extending the Google slogan of “things, not strings” to support also entity categories, STICS provides powerful functionality for querying and analyzing news and other text corpora in terms of entities, semantic classes, and text phrases. You can search, for example, for presidents of the United States and the JFK airport, and see how STICS distinguishes between JFK and JFK.
If you haven’t heard of Berlin Buzzwords, it’s the conference covering all the latest buzzwords surrounding the biggest buzzword of them all: Big Data. And boy is it buzzwordy! I’m …
I just read the latest blog post on the Heidelberg Laureate Forum, and I want to say that I very much share the feelings of Adrian Dudek, who wrote it. …
When coding for a long time in a black-on-white editor, I always get problems with unfocused and sleepy eyes. Going white-on-black helps keeping my eyes focused, however reading the code …