Postdoctoral Researcher position in Advancing Reliable LLM-based Data Curation Systems

About the project

We invite applications for a postdoctoral research position in the Foundations of Algorithmic Verification group led by Prof. Joël Ouaknine. The successful candidate will work in close collaboration with an industrial partner, delving deep into the verifications of Large Language Models (LLMs) based software programs, and contributing to bridging scientific research and applications.

Project Insight: We are embarking on a pioneering project that aims to develop reliable LLM-based data curation systems for data verification and data enrichment tasks such as verifying or discovering entity relationships from textual documents and/or the Web.

An LLM-based data curation system deconstructs complex data problems into manageable sub-problems, each addressed using LLMs. However, these models can introduce uncertainties and errors, including hallucinations, which hinder their adoption in industrial production environments where high accuracy is critical.

Consider a knowledge graph enrichment system designed to identify or infer relationships between two entities within a document. This system may utilize a long-context LLM, capable of processing the entire document, or employ a Retrieval Augmented Generation (RAG) process, including GraphRAG, to pinpoint and analyze the most relevant information. However, research suggests that both strategies can yield inaccuracies, presenting challenges for their deployment in production environments.

This project aims to propose a verification methodology that ensures the reliability and accuracy of an LLM-based data curation system at both the sub-component and whole-program levels.

Additionally, the project will focus on several critical research areas:

Effective retrieval of pertinent information from documents.
Balanced integration of RAG and long-context LLMs to mitigate trade-offs.
Detection and correction of "hallucinations" or incorrect inferences by LLMs.
Verification of LLM-based reasoning to ensure result accuracy.
Optimization of overall system efficiency.

The postdoctoral researcher will contribute to defining the methodology and develop and refine this approach, assisting in the development of a system optimized for data curation using LLMs.

Focus of the position:

Research and development of innovative verification methods to ensure the reliability and accuracy of LLM-based data curation programs.
Actively collaborate with industrial partners and engage in creative design and development of an LLM-based data curation system.

While the successful candidate will be hired by, and work at, the Max Planck Institute for Software Systems in Saarbrücken, frequent collaborations with, and visits to, research partners, in particular TU Wien (Vienna, Austria), UCL (London), University of Calabria (Rende, Cosenza, Italy), and to industrial partners are necessary. In addition, the successful candidate is expected to spend one or more internships in industry. The project will build on methods and software provided by our industrial partners. We are thus looking for a candidate who is keen and able to liaise with industry, and who is interested in transformational research, working on practical problems of industrial relevance.

Your qualifications and responsibilities

Required:

A PhD degree (earned or near completion) in algorithmic verification, machine learning, information extraction, large language models, databases, knowledge graphs, or a related field.
Strong algorithm design and coding skills, along with proficiency in popular ML development frameworks such as TensorFlow, PyTorch, and frameworks for building LLM-based applications, such as LangChain and LlamaIndex.
A thorough understanding of Large Language Models' underlying techniques and experience in fine-tuning or customizing such models.
Research publications in top-tier journals or conferences. In exceptional cases, industry experience with a solid background in industry-based software engineering that has led to highly innovative products or results could partially or fully replace the publication requirements.
Ability and willingness to liaise with industrial partners and to work on problems of practical relevance.
Ability to supervise students and/or research assistants.
Proficiency in written and spoken English. (Knowledge of German is not necessary.)

Beneficial:

Proficiency in RAG or GraphRAG-related techniques, along with experience in building RAG-based applications.
Research experience in topics relevant to generating accurate results with LLMs, including hallucination detection and correction.
Relevant experience in fields such as Information Extraction from Unstructured Text, Knowledge Graph Enrichment, Databases, or Fuzzy Logic.
Industrial experience, particularly experience in areas like Big Data Engineering and MLOps, coupled with familiarity with cloud services such as AWS.
A product-oriented mindset and product design capabilities.
Experience with software verification.
Experience leading teams or projects, as well as supervising junior developers or researchers.

For informal enquiries, please contact Prof. Joël Ouaknine (joel@mpi-sws.org).

To apply, please send a cover letter and CV by email to Ms. Lena Schneider (lschneid@mpi-sws.org).

Applications will be reviewed until a suitable candidate is found. To ensure full consideration, please submit your application on or before 25 Nov. 2024. We expect to hold online interviews in early December 2024.

Back to Joël Ouaknine's home page

Imprint / Data Protection