DeepSeek’s Chatbot Scores Low in NewsGuard Audit, Trails Western Rivals

DeepSeek, a Chinese AI startup, saw its chatbot underperform in a recent NewsGuard audit, achieving just 17% accuracy in delivering accurate news and information. The audit compared DeepSeek’s chatbot with Western AI models, including OpenAI’s ChatGPT and Google’s Gemini, ranking it tenth out of eleven. DeepSeek’s chatbot was found to repeat false claims 30% of the time and provide vague or unhelpful answers 53% of the time in response to news-related queries, leading to an overall fail rate of 83%. In contrast, Western competitors had an average fail rate of 62%.

This performance raises questions about the quality of DeepSeek’s AI technology, which the company has touted as being on par with or superior to OpenAI’s models, at a fraction of the cost. Despite its low accuracy score, DeepSeek’s chatbot quickly became the most downloaded app on Apple’s App Store, igniting concerns about the United States’ dominance in AI and contributing to a market downturn that resulted in a $1 trillion loss in U.S. tech stocks.

NewsGuard used 300 identical prompts to assess DeepSeek and its Western counterparts, including 30 based on false claims circulating online. The topics of these prompts included incidents like the killing of UnitedHealthcare executive Brian Thompson and the downing of Azerbaijan Airlines flight 8243. DeepSeek’s chatbot also reiterated the Chinese government’s stance on certain issues, even when those topics were unrelated to China, such as in the case of the Azerbaijan Airlines crash.

Despite its poor accuracy, some analysts suggest the significance of DeepSeek’s breakthrough lies in its affordability, with D.A. Davidson’s Gil Luria pointing out that it can answer questions at 1/30th the cost of comparable models. However, as with other AI models, DeepSeek was found to be particularly susceptible to repeating false claims, especially when used to create or spread misinformation.