Current Research

Projects

ADAAT/DAF

Automatic Data Acquisition and Annotation Tool (ADAAT)

author: Lorenz Dumanski

Introduction

AI has rapidly grown into an industry-changing and, in many ways, life-changing technology. Despite its exemplary progress, there has been growing discontent within the European Union about which data is used for AI models. Recent statements show that AI can only become better and, above all, more productive if it delivers usable results. Usable or good results are those that are based on high-quality data or have been used as a training basis for AI. However, this applies not only to the relatively young field of AI, but also to the research area of digital forensics. In the field of digital forensics, especially with regard to communication data, there has been an acute shortage of data since then. Rather, forensic data is legally sensitive per se due to its typical characteristics, namely its personal or crime-related nature. Added to this is the wide variance in the nature of the data, which makes standardisation even more difficult, and the research community’s often unwillingness to share its data sets for further research.

ADAAT

In the vision, ADAAT will represent a platform that aims to simplify the collection of communication data. Communication data will be generated using a human-machine interaction-based approach and in compliance with the parallel developed Data Acquisition Framework (DAF). A unique selling point is the ability to automatically link quantitative surveys in the form of questionnaires with the communication data. The focus is on communication scenario-based collection. Questions from the questionnaires provided can be transferred to a communication scenario using DAF (DAF item). During the communication phase, the test subjects chat with an LLM persona adapted to the corresponding DAF item.

ADAAT System Design

In addition, ADAAT actively involves the test subjects in an evaluation loop. Downstream evaluation surveys can be used to determine the authenticity, quality and integrity of the DAF items. This not only enables the immediate evaluation of the communication data sent for further processing, but also allows for continuous improvements to the DAF items. Furthermore, ADAAT can be used for the evaluation of LLM or LLM-based agents. Finally, purely quantitative or qualitative surveys without the LLM layer will be possible, for example as a tool for questionnaire-based surveys. This is ensured by the modular development principle of the platform, thus ensuring a wide range of applications.

DAF

DAF provides a framework and methodology for transferring items from established question inventories, for example from social research, into communication scenarios. This allows the connection between item and scenario to be evaluated in advance, thus enabling targeted data generation. Another strength is the ability to pursue exploratory research approaches. Even if the validity of an item-scenario connection has not been evaluated in advance, these connections can be examined retrospectively using correlation analyses or other methods.

Status

Even though the specific implementation details still need to be worked out and implemented, the ADAAT/DAF already offers advantages in its current form:

  • ADAAT/DAF produces data that is standardised across all ADAAT applications. - ADAAT can be used locally and without internet access. This ensures that the data does not leave the organisation.
  • Thanks to the database-based backend, users from non-IT fields can orchestrate and provide complex surveys after only a short training period.
  • DAF presents an exemplary evaluation flow for DAF items. However, the evaluation strategies can be adapted/replaced or omitted as required, provided that exploratory approaches or analytical methods are to be applied to the data collected.
  • To ensure that the synthesised data complies with data protection regulations, the inputs and outputs of the components are cleaned up. This ensures that no personal data is present in an ADAAT data set.
  • Since ADAAT data is standardised via a schema, the hurdles for programmatic further processing are minimised.