A Journal Editor’s Perspective on AI

Artificial intelligence has already transformed how scientific information is created, reviewed, and consumed. As AI tools become more deeply integrated into medical publishing and clinical practice, many of us are asking the same question: How can we take advantage of these technologies while maintaining trust, transparency, and scientific rigor? In this episode, we explore both the opportunities and the challenges that come with AI’s growing role in medical communications.

Joining us for this conversation is Raja-Elie Abdulnour, Editor-in-Chief of NEJM Clinician and Chief Clinical Innovation Officer at NEJM Group. Together, we discuss how AI is being used across editorial workflows, why disclosure and accountability matter, and what role editors play in safeguarding scientific honesty and rigor.

To access the published paper, Can AI Say “I Don’t Know”?, referred to in this episode, please click here.

To join ISMPP, visit our website at https://www.ismpp.org/

Disclosure: NEJM Group has licensing agreements with several AI companies, including OpenEvidence, Abridged, and Perplexity.

Downloadable transcript here

Rob: Whether we’re ready for it or not, artificial intelligence is already reshaping how we work across medical communications and publishing. For many of us, the question is no longer whether to use it, but how to use it well. And what many of you want to know is 1) how to leverage these tools to your benefit at work and 2) how to make sure the work stays grounded in rigor, transparency, and trust.

This is In Plain Cite, a podcast exploring the biggest questions and trends facing medical publication and communication professionals. I'm your host, Rob Matheis, President and CEO of ISMPP.

On our episode today, to discuss AI from the perspective of a journal editor, we have Raja-Elie Abdulnour. Raja-Elie is the editor in chief of NEJM Clinician. He is also the chief clinical innovation officer of the NEJM Group, where he does a lot of thinking about how AI will be integrated into the medical profession. And, additionally, he is an Associate Physician in the Pulmonary and Critical Care Medicine Division at Brigham and Women’s Hospital, as well as a part-time assistant professor of medicine at Harvard Medical School.

This is a really impressive biography. We have a real good conversation ahead. Let's jump right in.

Rob: When I saw that we were gonna have the opportunity to chat today, I was really, really excited, um, to have you here with us to give us some perspectives. I always like to start at the very beginning and ask our guests just to level set. You know, in your mind, what do you see as artificial intelligence, and how is it being used overall in medical publishing? Maybe just get us started there.

Raja-Elie: You know, for listeners, a bit of context is that I wear many hats. I'm a practicing clinician, and AI has entered the, you know, is at the bedside. I'm an editor, I'm a researcher, a software developer, a clinical reasoning aficionado, technologist, video gamer. What I found interesting is that the Venn diagram of all of these hats is AI, which is why I've been thinking a lot about it. Full disclosure, I'm not a computer scientist. I haven't done a PhD in computer science. I've read and learned a lot about reasoning, so just intelligence in general, and I'm a technologist. And so I will confabulate, just like AI does. So hold me to my words, Rob. Fact-check me, and I promise to tell you that I don't know if I don't.

Anyway, so the way I, you know, artificial intelligence, I think about it as there are three types of definitions. There's the technical definition, which is, if you go on YouTube and you look at what is AI, you have a whole bunch of talks on different neural networks and stuff like that, which I think is interesting but maybe not immediately relevant.

Then there are the ability definitions, meaning AI defined as what it does. For example, visual AI, natural language processing, scribes. So all of these are different types of AI, which is a bit more relevant.

But the definition I really like is the relational definition of AI, which is defined as an AI interaction. When I'm interacting with a computer system whose judgment affects my own, and I don't really know how it got to that judgment, and therefore I need to trust it.

And so what I like about the relational definition of AI, which is a bit like the Turing test. So the Turing test of AI is that it's really independent of the technology. Like, I don't need to know how the thing works. All I need to know is that I'm working with a computer system that's affecting how I think. And so, for example, for a five year old, a calculator is an AI, right? Because it's magic. Like, how this thing works. But then once you understand math, it becomes a tool. And same thing if a robot walks into my office right now from the year 2500. That's an AI, right? Because it interacts with me in a way that affects me, and I don't know how it works.

And so in that regard, the reason I like it is because, again, I don't need to worry about the technology. All I need to worry about is that it's a computer system, it's gonna affect how I think, and it's gonna affect my judgments, and I need to decide if I can trust it. And so if we start there, and you think about how we go through our day and we interact with all these tools now that did not exist two years ago and now they're everywhere, you can start seeing how we're engaging with these computer systems. They're certainly affecting how we think, whether it's in the clinic, whether it's through giving a cocktail recipe, whether it's teaching us about something that's outside our expertise. We're starting to make these leaps of faith sometimes, and that's where there's opportunity but there's also risk.

Rob: So let's go back to that word trust, right? Trust is at the heart of what our listeners do. They're medical publication, medical communication professionals, and their job is to really make sure that scientific evidence is published in peer-reviewed journals and can be trusted at the end of the day.

If I asked you to put your editor hat on, editor in chief hat on, what levels of trust do you have when you're using artificial intelligence? Can you talk a little bit about that for us?

Raja-Elie: We've been thinking a lot about trust because, again, if you define AI as a computer system that you need to decide whether to trust or not, then it makes me think about, okay, so what does it take me to trust a tool, let alone a person?

And I think about it simply. In my mind, there are three things that matter. One is effectiveness. If I'm gonna trust someone or something, I need to make sure that entity, so let's for now focus on AI, I need to make sure that AI is good at what it does. How do I determine if a tool is good? By practice, by working with it, but also by seeing if there is evidence that shows that particular tool is good at what it does. And I think that's relevant, especially if you're working with a high-stakes task like, you know, taking care of patients.

The second pillar of trust, in my mind, is transparency. I need to know a little bit about how a particular AI was developed, who developed it, how it was trained. And then, when I'm working directly with an AI, whether it's ChatGPT or Claude or others, I always press it. I always ask, "Explain your reasoning," "Flesh out your reasoning," and to really surface how it got to a particular answer. Keep in mind that these tools are always predicting the next word. So even when it's explaining its reasoning, it's actually doing it by predicting the next word as opposed to truly explaining its reasoning. But that doesn't matter because it's so good that it actually works. And there you can see that if you ask it to explain its reasoning and it's all over the place, I'm not gonna trust this tool, right? Even if it's giving me the right answer. Even if it tells me that patient has pneumonia, and then it explains to me that the patient has pneumonia because they have abdominal pain or they have a rash on their knee, then I know, okay, this tool is not good for me.

And then the third pillar, which is actually, in my mind, the most important one, is accountability. When you're working with a tool or a person, knowing that the entity that you're working with, or the AI that you're working with, is accountable makes a whole lot of a difference. And accountability requires two things. One is benevolence, meaning that the AI has my best interest in mind. And second, non-malfeasance, which means that if it does mess up, it's gonna get penalized, right? And so if you think about how we navigate with colleagues, with doctors, with anyone, accountability is a very important pillar of trust. And the problem is that AI is unaccountable. The only accountability there are the manufacturers of AI. They do claim that they have our best interest in mind, but one would argue that it's not necessarily easily demonstrable. And this is where it becomes quite a fine line, and it's not necessarily there yet in terms of accountability.

So when I decide to trust a tool, it's mainly on its ability to be transparent and its effectiveness. And that comes through practice and through learning and getting evidence about a particular tool.

Rob: So this is, this is very interesting. My question for you is, given these various pillars and the transparency and accountability that goes into using artificial intelligence, do you, when you have your hat on as editor in chief of NEJM Clinician, do you use artificial intelligence? And if so, how?

Raja-Elie: I use artificial intelligence all day. I have my small army of AI tools. I use it in research, I use it in writing, in editing, in thinking, in brainstorming.

But I use different tools for different things, and I use them in very different ways. But fundamentally, I never fully trust them. I always have a stance, which is good that I'm an editor. I'm always skeptical. But I'm not always fully skeptical or not skeptical at all. What I do is I calibrate my skepticism based on the tool I'm using and based on the task that I'm doing.

So, for example, if I'm using ChatGPT or Copilot or a tool embedded in my email application to draft an email, that's very low stakes. It's just to say hi to someone. The stakes are gonna be very low, okay? And in fact, it's not very important to me what tool I use because it doesn't require much. I'm gonna be able to very quickly edit it and send it away.

If I'm at the bedside, again, as a clinician taking care of a patient, and the patient's not doing well, there's a gap in my knowledge, and I don't have access to the expertise right there, right now because the expert consultant is busy somewhere else, and if I use an AI tool, an AI clinical decision support tool, my skepticism is very high, right?

I'm going to put a very carefully crafted prompt, give it as much context as possible, and then when I get the answer back, I'm gonna put my super-duper editor hat on, read every sentence, look at every reference, and then decide whether or not...And then do a go/no-go decision. Either I take that input or I do not, knowing full well that I'm ultimately responsible. And so these are the two extremes.

As an editor, writer, researcher, we disclose at NEJM Clinician that we allow the use of large language models for many aspects of the editorial workflow. But the way we use it is very, very deliberate and, in fact, in pilots.

So we formed an internal AI task force, an AI editorial pilot team, where we've sort of delineated different use cases, thought about which tools would work best, how we'd implement it.

A few people would test it cautiously, would share the learning, and then come up with recommendations and suggestions.

And so this is sort of a deliberate, centralized effort to do so. We encourage everyone to use it and test it.

The key thing is to always do two things. One, disclose. And second, remember that the words that are gonna be published or the words that are gonna be passed on are gonna be yours, are gonna be the human words.

And I think, with that in mind, I personally encourage the use of AI. It's just based on the author, based on the editor, on their comfort in using it.

I do see it in... I'm gonna share, for example, some of my concerns.

So as an editor, sometimes I receive a manuscript or a summary or a piece of content where it's obvious that it was drafted with the use of a large language model.

It may have been edited, and it may represent the opinion of the author, but it reads like a large language model.

That's not... You know, I wanna read what the author is thinking. I don't wanna read what AI is interpreting as the author's thoughts.

And second, the lack of disclosure is very concerning to me because if an author doesn't disclose the use of AI, it does not prompt an editor to be a bit more skeptical.

And then conversely, if someone says, "Hey, I've used ChatGPT or Claude or whatever to draft and/or edit this piece of text," then I know, okay, then I'm gonna be a bit more cautious here, and I'm gonna make sure that no AI slop enters the text.

Rob: It's interesting. A lot of our listeners are really struggling right now with whether or not they can support authorship with artificial intelligence, whether or not authors can use artificial intelligence in the drafting of manuscripts, and I was really curious to get your thoughts on it. I love the idea that disclosure is really one of the most important components. It's one thing if you're reading a manuscript and it is very obviously drafted by artificial intelligence, and then there's no disclosure of the same. Then it makes you suspect what else may not be trustworthy. Is there a secret way that you're able to tell if a manuscript has come through with too much AI?

Raja-Elie: Yeah, that's a great question.

So before I go there, I just wanna highlight one thing. In my personal experience, there are a few ways where using AI for authorship works as a draft. What's really fascinating about these models and these tools is that they gain a memory of you. So they learn more about you, who you are, et cetera, et cetera. And I personally, when I use it, give it very clear instructions, guidelines, editorial style guides. But the key is that I do not ask it for a draft.

What I found works really well is to use it as a scribe. And so what I do is I actually read out or think out loud whatever I wanna write and then use that as a draft. It's terrible when you say, "Hey, I wanna write something about this topic. Create a draft for me." Which I'm very concerned is probably how most people are using it.

One, there's a substantial offloading of reasoning, right? You're basically not doing the work.

And the work of writing, the work of thinking, is really so critical.

So that's one thing I would encourage listeners to experiment with, is use the scribe and then think out loud the manuscript that you want to write.

And in terms of, you know... okay, so let's say someone doesn't disclose and the piece comes in.

There are the typical tells that it's been used by a large language model, and there are a few tells.

But let's say there are none. I don't have a good solution.

There are tools out there that could detect if the text has been written by AI, but what I found is that they are neither sensitive nor specific.

And that's because two things are happening.

One is some authors are prompting an AI to specifically write in a way that it doesn't sound like AI, right? They're purposely trying to fool us.

And I'm not saying that the authors who are submitting to our publication are doing so. In fact, I'm thinking more about my own kids and students in K–12 education and so on and so forth.

But the second is because I'm finding that the writing style has changed in a way where unassisted writing is starting to read like AI.

It's almost where, by interacting with these tools, we're acquiring a particular style of authoring, which makes it difficult to... which just confounds the whole thing.

Rob, what I would say is that, disclosure notwithstanding, what really matters is when I'm reading the text, that the text passes muster, that the text makes sense, it's coherent, it's evidence-based.

When citations are used, citations support the text.

And to me, that's the most critical part.

In fact, it emphasizes the role of the editor, right?

When I first started interacting with these tools, it was clear to me that editors have actually the most important role in today's information ecosystem.

Editors are the gatekeepers between AI slop and AI-augmented human knowledge.

And I take this very seriously.

We at the journal, broadly, have been taking that role extremely seriously. We see ourselves and other medical editors and medical publishers as gatekeepers for AI slop.

It's a tall ask because sometimes they're hard to detect, right?

Rob: Makes good sense. Have you ever outright rejected a manuscript on the basis of too much artificial intelligence, or if it read as though it was not really the work of the author, for that reason?

Raja-Elie: I have.

At NEJM Clinician, they send us summaries. I've rejected drafts where I've said, "Okay, this is too much like AI. I'm not even gonna read it. So please send it back, and then I wanna hear your thoughts."

There was even one of our contributors whose email conversations were reading like... I almost felt like I was reading ChatGPT, and it was frustrating.

And I had to tell them, "Use it. Just disclose it."

Because it just felt it. So it's maybe a personal pet peeve.

But I have, yeah, pushed back on submissions because I'm concerned that they haven't looked at it. They haven't reviewed it. They haven't edited it.

Rob: Yeah, and that gets really back to the heart of trust and transparency that you started talking about earlier on.

If you're reading something and it looks as though it's been drafted by a computer, then where was the human in that whole process?

Raja-Elie: One framework I've been really gravitating towards is the framework of a co-pilot.

And I mean it through the lens of the aviation industry.

If you think about a pilot working with a co-pilot, the amount of trust that is required for the crew to work together is substantial.

And so, just like the aviation industry has developed systems and methods and processes to maximize trust between a pilot and a co-pilot, but also practices to mitigate letting your guard down and over-relying on a co-pilot, I think there is value in thinking about AI in a similar way.

At the risk of anthropomorphizing AI, it does a few things.

First of all, it centralizes and establishes our role as pilots, and therefore the buck stops with us, right?

And then it also emphasizes the importance of trust and thinking about the determinants of trust, as opposed to replacement and full offloading.

A pilot never fully offloads a task. Even if they delegate a task, their radar is up and monitoring, making sure that things are moving smoothly.

And a successful editor, writer, who uses AI is a successful pilot who's leveraged the advantages that AI provides in authoring and editing and reviewing, and in multiple tasks.

Again, I think editors have a critical role as gatekeepers against AI slop, but they are also extremely well-positioned to leverage the benefits of these large language models because language is our bread and butter.

Language is our raw material.

And these tools are so effective that I think editors are a very good substrate to benefit from AI.

Rob: It's, uh, you know, when I was thinking about this podcast, I was curious as to what perspective you might have and whether or not you were gonna be anti-AI or whether or not you thought it would be acceptable. And I'm definitely getting a read that our listeners can take away that, at least in your shop, there's not a strong opposition to AI.

In fact, it can be used effectively as a tool, as long as it's used effectively, we question the model effectively, and disclose, which I love.

So now our listeners are also medical communication professionals, so they're responsible not just for the publication, but also what happens to the publication after it's in the peer-reviewed journal.

And to that extent, if I asked you to put your hat back on as a clinician, and you started talking earlier about decision support tools and AI, and there's certainly a lot of new tools available, what's your thoughts on AI summarizations and things coming out for clinicians to use at the point of care?

Raja-Elie: Yeah. So, um, I think it's a critical question.

I'm a very cautiously optimistic person. My caution and my optimism, vis-à-vis AI and large language models, is really more about how humans are using these tools as opposed to the tools themselves.

And I think that the tools themselves, there are some ethical ramifications and aspects to them as you think about how they've been developed, the environmental impact, the risk of inequity, and so on and so forth. And these are very important and may need their own podcast.

But where I'm most concerned is their misuse because folks that are using them just are not using them correctly. They're offloading their reasoning, they're offloading their tasks.

And so if I think about it as a clinician, a clinician editor, what I found myself, when I started using these tools, is I was both in awe of what I was seeing, almost in a magical sense, even though the underlying technology is pretty basic.

It's just the impact was pretty profound.

But at the same time, I found it very hard to trust in high-stakes situations, especially knowing that they confabulate.

They're terrible at citing references. References are a string of numbers and letters, sometimes with no meaning. So all of these are confabulated.

And the biggest thing to me was like, okay, where is this information coming from, right?

So this idea of transparency.

If I ask a colleague, or if an expert tells me ceftriaxone is the best antibiotic in this situation, and I ask them why, they are able to tell me, "Well, these are the guidelines that suggest it, and this is the NEJM review article that reviews that information," right?

So there is attribution to the content.

And this is where, very early on, it was apparent to us that to safeguard a clinician's ability to take care of their patients, we really needed to make sure that as they use these tools increasingly, and almost all the time in the clinic. In fact, when I work at the hospital, all the residents are using an AI clinical decision support tool, it really became important to us to make sure that we need to protect their skills, we need to protect their practice, and we need to protect the patients.

And so we've been working a lot in helping these tools ground their recommendations in the best evidence possible.

And so this is where we license content to these tools, so that when a recommendation is given, there's a clear reference assigned to it, a reference that can allow a clinician to cross-check whether or not the recommendation is based on evidence.

Rob: Yeah. This is at the heart of where our profession is actually going. We realize that these decision support tools are being used on a day-to-day basis by residents and probably even seasoned clinicians.

I'll tell you where my concern comes in. We're having a conversation, and you're telling me about how you'll question the model, you'll question what you're reading, but I'm worried about newer clinicians or busy clinicians at the point of care who are just taking at face value what they're seeing in an AI-generated summarization and prescribing that antibiotic without questioning it.

And are you seeing that? Is that a concern that you have?

Raja-Elie: I'm definitely seeing it.

Firsthand, I see medical students who are entering the clinical workspace with novice skills. And as they start using these tools, they're never acquiring the skills of critical thinking and so on and so forth.

And they're over-relying on these tools, which actually leads to an automation paradox.

And the automation paradox is that, because of automation bias, a clinician may over-rely on a particular AI tool because they're so fluent, right? This is the fluency trap.

And as they over-rely on an AI tool, they de-skill, or in the case of a student, they never skill.

And as they de-skill or never skill, they become even more reliant on the tool, and they become even more vulnerable to the failures of the tool.

And so it's this vicious cycle that can be pretty terrible.

And the solution to it is: train the pilot and train the co-pilot, right?

So this is where we need to do a very important job of teaching the pilot, whether it's a clinician or it's an editor or it's an author.

This is where we need to learn how to use these tools so we can recognize their risks, but also leverage their opportunities.

But I also think that there is work that needs to be done on the AI side.

Meaning, the developers of these tools need to focus on improving these co-pilot skills. Surfacing assumptions, surfacing references, using internal systems to increase accuracy.

In fact, we just published, myself and co-authors, a perspective in the New England Journal of Medicine called Can AI Say "I Don't Know"?

And it turns out it can't.

We did an experiment where Andrea Sikora, the first author on the paper, presented a bunch of AI models with a list of medications. They all sounded weird because all medications sound weird.

But within the list are names of Pokémon characters…Pokémon, the video game characters.

And we were trying to see if the AI could identify a Pokémon character from a medication, and it failed miserably.

And so instead of saying, "I don't know," it confabulates, right?

So there are different things that the developers must do to improve these tools.

So what does it mean for us as editors or as authors?

The same way that at NEJM we're realizing the primacy of highly curated, highly fact-checked content, not only to inform the human reader but also to inform the AI reader, right?

So now we have two readers.

And I think that anything we do that's gonna be high quality, that's going to be impactful, sure, the primary reader is human, but we need to assume that the vast majority of the time that folks are gonna encounter our content is gonna be through these models.

And so we need to keep in mind the AI reader when we write the content.

And we need to continue making high-quality content to ground these models for the benefit of the human reader, while systems are in place to make sure that the AIs are accurate and do not spread misinformation and so on and so forth.

Rob: Yeah, I'm in 100% agreement with you, and this is really something that's at the heart of areas where I like to talk about, especially when I get to the podium.

At the end of the day, we know that people aren't necessarily reading all the publications word for word anymore, and machines are translating them.

So, given that's the case, are there things that publication professionals should be doing or considering as they're helping authors prepare these manuscripts?

To be honest, we're finding that a lot of the summarizations are making errors. They're misinterpreting the figures or they're reading the tables incorrectly, and then when the summarization comes out, it's not reflecting the data accurately.

Is there things we can be doing to protect that?

Raja-Elie: So, you know, summarization is my bread and butter, right? That's what we do at NEJM Clinician. We summarize research.

And when we tested it, it's terrible at summarizing…to date. And again, these tools are evolving, and they're evolving at such a rapid pace that we could be having a very different conversation in a few years where the summarization potential of these tools is actually gonna be better than 99.99% of humans.

In which case, I would tell you, we don't need to summarize anymore, right? But assuming that's not gonna be the case, and more importantly, they're never gonna be accountable, right? So we're always gonna have to check it. I would not use it for summarization except for low-stakes tasks.

So, for example, on NEJM.org, we've launched an AI companion. If you are a subscriber, you go to NEJM.org, there's an AI companion that is focused on the article, right? We've designed it in a way that it only talks about the article that you're looking at. That article has been embedded in the tool, so it's not reading the way we're reading. It's been optimized for the AI.

And the prompts and systems and checks behind the scenes are such that, when we tested it, we were satisfied with the results enough to put it on our pages, with a very important disclaimer that it should not be used at all for any high-stakes situation, whether it's making decisions, driving hypotheses, research, or care, et cetera. It's only meant to support the reading experience and engage the reader with the actual content.

I'm concerned about AI summarization. It should be a crutch or have a very specific use case. But having human oversight is gonna be critical.

Rob: Yeah, it sounds as though there's a lot of work to be done at all the different levels. Our authors need to be aware of the data summarizations. Editors need to be aware of summarizations that are going to happen downstream. There need to be appropriate caveats put in place.

But what I do hear, the central thesis of our discussion here today, and it's been really fascinating, is disclosure, trust, and questioning. Basically, every time we get into any of the topics, it's really about having the human ask questions and really push the model to make sure that you have something accurate, something that can be trusted in front of you. And I think that's a real good takeaway.

Raja-Elie: Yeah, a couple of things.

So again, in this perspective around can AI Say "I Don't Know"? This is a critical advantage between human intelligence and artificial intelligence. Humans possess this virtue called epistemic humility. Humans can recognize when they're entering a knowledge domain that's unfamiliar. And an epistemically humble human, which ideally is all of us, when they enter a space where they don't know, will actually say, "I don't know." So they'll surface assumptions, they'll ask colleagues, they'll confirm the information. And it's become the most important virtue for us to teach and develop and nurture. And as long as we're epistemically humble, we should be okay questioning ourselves and questioning the input of AI. The key thing is teaching it and how to teach it. And there are some ways to do it. But to your point, as editors, if there is one competency that defines us, it's skepticism and epistemic humility, right?

The other thing I wanted to say is I hope that listeners take away from this the relevance of high-quality, curated, written, edited content. Because that content is gonna be used by these models.And the better the content, the better the output, right?

So we need to assume that folks are gonna be reading through LLMs as opposed to the primary data. Not always, but often. And so we need to keep making the best content we can.

Well, that's us for today. Thank you all for listening. Please take a minute to subscribe to In Plain Cite on your favorite podcast app. Share with your colleagues and rate our show highly if you like what you heard today.

In Plain Cite is a production of ISMPP, the International Society for Medical Publication Professionals.

Our production partner is CitizenRacecar. Our producer and editor is Hajar Eldaas. Post production by Alex Brouwer. Publication and promotion by Candice Chantalou.

To join ISMPP today, go to ismpp.org. Becoming a member means you can participate in value packed webinars and receive instant access to exclusive tools and resources. If you're interested, just go to ismpp.org, that's ISMPP.org, to learn more.