OpenAI’s ChatGPT cannot provide accurate responses to evidence-based questions, mentioned a recent study which found the chatbot's off track functioning while producing an answer to questions related to medical queries.
The researchers from the Commonwealth Scientific and Industrial Research Organisation (CSIRO) and The University of Queensland (UQ), Australia, tested the popular chatbot with almost 100 medical questions which were either just simple queries or a query-biased with supporting or contrary evidence.
The research team found that the chatbot was 80 per cent accurate while answering the questions in simple format. However, its accuracy failed by 63 per cent while responding to the prompt biased with evidence.
The researchers are not sure about the chatbot’s failure. However, they said that large language models (LLMs) are capable of producing content in the natural language as they are trained on massive amounts of textual data.
Further, the team suggested that it is important to train LLMs to respond to users' health queries with more accuracy, as considering the digital boom large number of individuals rely on the Internet and especially the AI chatbots for several doubts.
The study was presented at Empirical Methods in Natural Language Processing (EMNLP) in December 2023. EMNLP is a natural language processing conference.