Postdoctoral Researcher position in Advancing Reliable LLM-based Data Curation Systems
About the project
We invite applications for a postdoctoral research position in the
Foundations of Algorithmic Verification group led by Prof. Joël
Ouaknine. The successful candidate will work in close collaboration
with an industrial partner, delving deep into the verifications of
Large Language Models (LLMs) based software programs, and contributing
to bridging scientific research and applications.
Project Insight:
We are embarking on a pioneering project that aims to develop reliable
LLM-based data curation systems for data verification and data
enrichment tasks such as verifying or discovering entity relationships
from textual documents and/or the Web.
An LLM-based data curation system deconstructs complex data problems
into manageable sub-problems, each addressed using LLMs. However,
these models can introduce uncertainties and errors, including
hallucinations, which hinder their adoption in industrial production
environments where high accuracy is critical.
Consider a knowledge graph enrichment system designed to identify or
infer relationships between two entities within a document. This
system may utilize a long-context LLM, capable of processing the
entire document, or employ a Retrieval Augmented Generation (RAG)
process, including GraphRAG, to pinpoint and analyze the most relevant
information. However, research suggests that both strategies can yield
inaccuracies, presenting challenges for their deployment in production
environments.
This project aims to propose a verification methodology that ensures
the reliability and accuracy of an LLM-based data curation system at
both the sub-component and whole-program levels.
Additionally, the project will focus on several critical
research areas:
- Effective retrieval of pertinent information from documents.
- Balanced integration of RAG and long-context LLMs to mitigate trade-offs.
- Detection and correction of "hallucinations" or incorrect inferences by LLMs.
- Verification of LLM-based reasoning to ensure result accuracy.
- Optimization of overall system efficiency.
The postdoctoral researcher will contribute to defining the
methodology and develop and refine this approach, assisting in the
development of a system optimized for data curation using LLMs.
Focus of the position:
- Research and development of innovative verification methods to
ensure the reliability and accuracy of LLM-based data curation
programs.
- Actively collaborate with industrial partners and engage in
creative design and development of an LLM-based data curation
system.
While the successful candidate will be hired by, and work at, the Max
Planck Institute for Software Systems in Saarbrücken, frequent
collaborations with, and visits to, research partners, in particular
TU Wien (Vienna, Austria), UCL (London), University of Calabria
(Rende, Cosenza, Italy), and to industrial partners are necessary. In
addition, the successful candidate is expected to spend one or more
internships in industry. The project will build on methods and
software provided by our industrial partners. We are thus looking for
a candidate who is keen and able to liaise with industry, and who is
interested in transformational research, working on practical problems
of industrial relevance.
Your qualifications and
responsibilities
Required:
- A PhD degree (earned or near completion) in algorithmic
verification, machine learning, information extraction, large
language models, databases, knowledge graphs, or a related field.
- Strong algorithm design and coding skills, along with
proficiency in popular ML development frameworks such as TensorFlow,
PyTorch, and frameworks for building LLM-based applications, such as
LangChain and LlamaIndex.
- A thorough understanding of Large Language Models' underlying
techniques and experience in fine-tuning or customizing such
models.
- Research publications in top-tier journals or conferences. In
exceptional cases, industry experience with a solid background in
industry-based software engineering that has led to highly
innovative products or results could partially or fully replace the
publication requirements.
- Ability and willingness to liaise with industrial partners and to
work on problems of practical relevance.
- Ability to supervise students and/or research assistants.
- Proficiency in written and spoken English. (Knowledge of German is not necessary.)
Beneficial:
- Proficiency in RAG or GraphRAG-related techniques, along with
experience in building RAG-based applications.
- Research experience in topics relevant to generating accurate
results with LLMs, including hallucination detection and
correction.
- Relevant experience in fields such as Information Extraction
from Unstructured Text, Knowledge Graph Enrichment, Databases, or
Fuzzy Logic.
- Industrial experience, particularly experience in areas like
Big Data Engineering and MLOps, coupled with familiarity with cloud
services such as AWS.
- A product-oriented mindset and product design capabilities.
- Experience with software verification.
- Experience leading teams or projects, as well as supervising
junior developers or researchers.
For informal enquiries, please contact Prof. Joël Ouaknine (joel@mpi-sws.org).
To apply, please send a cover letter and CV by email to
Ms. Lena Schneider (lschneid@mpi-sws.org).
Applications will be reviewed until a suitable candidate is found. To
ensure full consideration, please submit your application on or before
25 Nov. 2024. We expect to hold online interviews in
early December 2024.
Back to Joël Ouaknine's
home page
Imprint / Data Protection