OpenAI's transcription tool is amazing. Hospitals use it anyway

OpenAI’s transcription tool is amazing. Hospitals use it anyway

Saturday, a The Associated Press investigation found that OpenAI’s Whisper transcription tool creates fabricated text in medical and commercial contexts despite warnings against such use. The AP interviewed more than 12 software engineers, developers and researchers who found that the model regularly made up texts that speakers never uttered, a phenomenon often called “confabulation” or “hallucination” in the field of AI.

Upon its release in 2022, OpenAI claimed that Whisper was approaching “human-level robustness” in terms of audio transcription accuracy. However, a University of Michigan researcher told the AP that Whisper created fake texts in 80 percent of public meeting transcripts reviewed. Another developer, unnamed in the AP report, claimed to have found fabricated content in almost all of his 26,000 test transcripts.

Manufacturing poses particular risks in health care settings. Despite OpenAI’s warnings against using Whisper for “high-risk areas,” more than 30,000 healthcare professionals now use Whisper-based tools to transcribe patient visits, according to AP report . Minnesota’s Mankato Clinic and Children’s Hospital Los Angeles are among 40 health systems using an AI co-pilot service powered by medical technology company Nabla’s Whisper, honed on medical terminology.

Nabla acknowledges that Whisper can confabulate, but he also reportedly erased the original audio recordings “for data security reasons.” This could lead to additional problems, since doctors cannot verify accuracy against the source material. And deaf patients can be greatly affected by erroneous transcriptions, as they would have no way of knowing whether the audio in medical transcriptions is accurate or not.

Potential problems with Whisper extend beyond health care. Researchers from Cornell University and the University of Virginia studied thousands of audio samples and found that Whisper added violent content and nonexistent racist comments to neutral speech. They found that 1 percent of the samples included “entire hallucinated phrases or sentences that did not exist in any form in the underlying audio” and that 38 percent of them included “explicit harms such as perpetuating violence, make inaccurate associations, or imply false authority. .”

In one instance from the study cited by AP, when a speaker described “two other girls and a lady,” Whisper added fictional text specifying that they “were black.” In another, the audio said: “He, the boy, was going to, I’m not sure exactly, take the umbrella. Whisper transcribed it this way: “He took a big piece of the cross, a very small piece…I’m sure he didn’t have a terrorist knife, so he killed a number of people.” »

An OpenAI spokesperson told the AP that the company appreciates the researchers’ findings and is actively studying how to reduce manufacturing and incorporating feedback into model updates.

Why Whisper confabulates

The key to Whisper’s unsuitability in high-risk areas comes from its propensity to sometimes confabulate, or plausibly invent, inaccurate results. The AP report says, “Researchers aren’t sure why Whisper and similar tools hallucinate,” but that’s not true. We know exactly why Transformer-based AI models like Whisper behave this way.

Whisper is based on technology designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

U.S. employment rises in November, bolstering expectations for Fed taper

U.S. employment rises in November, bolstering expectations for Fed taper

Exclusive: USMC and US DOD Personnel Data Reportedly Listed by Threat Actor

Exclusive: USMC and US DOD Personnel Data Reportedly Listed by Threat Actor

Leave a Reply

Your email address will not be published. Required fields are marked *