In February 2020, the Biomarkers Consortium (BC) held its first workshop on the use of remote monitoring tools in medical development. No one had planned for it to be the last time the group would meet in person, but the Foundation for the National Institutes of Health (FNIH) could hardly have chosen a better topic to mark a plague year. The lack of rules or guidelines supporting the use of wearables and other digital technologies in clinical trials was enough of an issue before Covid-19 made it almost impossible to run studies without them. Now it’s convulsing the industry.

“What 2020 has taught us, if nothing else, is that the environment can change dramatically, very quickly,” says Joseph Menetski, the FNIH associate VP responsible for the BC. “We are now in an environment where people are looking to do more and more remote monitoring with digital tools and wearable devices. And we’re in a place where we don’t have a lot of good data on whether these things work or not.”

Take the heart monitors built into many smartwatches. Their readings, which differ significantly between brands and activities, aren’t aligned to any reference standards. A recent paper in Nature on the topic found that absolute error during activity was, on average, 30% higher than during rest, and that research grade wearables were actually less accurate than their consumer equivalents, which also showed their own quirks and eccentricities. The Apple Watch 4 was less reliable when wearers were breathing deeply than at any other time, while all other devices worked better during deep breathing exercises than when their users were resting.

“We don’t know how to look at that data,” says Menetski. “When is it bad? And when is it good? It opens up a whole new set of parameters that are necessary to understand before you can actually make a [clinical] decision.”

As Menetski notes, the importance of partnerships and collaborations in the pharmaceutical industry is often underplayed, but this is a challenge uniquely bound up in the ability to get diverse groups working together. When it comes to approvals, companyspecific measurements are no measurements at all.

A clear path

Back in the mid-2000s, the BC was set up to turn a general sense that novel biomarkers would be helpful in driving drug development into a clear path for qualifying and using them. To do so, it brought together the National Institutes of Health with the Food and Drug Administration, academics, pharma, biotech, foundations and patient advocacy groups. “That’s what makes it strong,” says Menetski: “everyone has a voice.”

Even so, despite its successes, when Menetski joined in 2016, he had some new ideas for the direction and focus of the BC. Then, the question on everyone’s lips was, “‘What amount of data, what kind of evidence do I need to make my biomarker useful, or for people to believe it?” He had begun to feel that they had it backwards.

“We don’t start now with, ‘Here’s my biomarker, how can I use it?’” he explains. “Now we start with, ‘I need to make this clinical decision. What do I need in order to do it confidently?’ Just that twist, that shift in thought process changes the whole thing.”

Helpfully, the BC isn’t the only body to have realised that. While the consortium’s executive committee was first looking into remote monitoring in 2019, Kai Langel, a director at Janssen’s clinical innovation group, was drawing up blueprints for a digital end point ‘ecosystem’, a practical framework for aligning companies with similar goals and accelerating the process of achieving them. As he sees it, researchers need to avoid the temptations of “the art of the possible”, which can lure unsuspecting scientists into devoting themselves to measurement solutions with no discernible purpose. “I think it all needs to start with a definition of what we want to measure, with evidence to show how that measurement will be relevant both for patients and the intervention in question,” he says.

Also in 2019, Jennifer Goldsack (interviewed on page 17) and others set up the Digital Medicine Society (DiMe) to support the field’s growth. The group’s recently published ‘playbook’ for developing and deploying digital clinical measures makes it clear that “I saw a cool Apple Watch at a conference” is no way to initiate a digital biomarker strategy. Instead, it lays out a framework that recommends determining a ‘meaningful aspect of health’ and identifying a practical, correlatory ‘concept of interest’ before defining the digital measure that can be used to evaluate it. Then, and only then, should developers turn to finding the right tools and technologies to record those measures.

By inverting that process, companies risk destabilising their whole development strategy. Clinical timelines are far longer than sensor life cycles. “I’ve seen researchers go to the regulators asking for scientific advice and referring to a specific device model,” says Langel. “The time it takes to get such advice and implement a solution in a trial means that by the time the trial is ready to start, the device may already have been replaced by the next generation.” It’s for this reason that the BC, DiMe, and Langel’s Digital Endpoints Ecosystem and Platform (DEEP) are all focused on driving the collaborations necessary to set and develop standards and benchmarks for evaluating and comparing digital measures – thus uncoupling outcomes and end points from specific sensor technologies.

“What 2020 has taught us, if nothing else, is that the environment can change dramatically, very quickly. We are now in an environment where people are looking to do more and more remote monitoring with digital tools and wearable devices. And we’re in a place where we don’t have a lot of good data on whether these things work or not.”

Joseph Menetski

Langel believes the DEEP is particularly apt for that challenge. Rather than leaving it up to companies to work out how to develop and validate digital end points for themselves, or centralising the management of the process, the DEEP combines a marketplace with a set of collaboration services. On the one hand, vendors can advertise both prevalidated digital measurements that are ready for use and components that could be repurposed to develop new ones. On the other, companies can publicise ‘desired solution profiles’ around which they can build collaborations or invite tenders.

The nature of these agreements is determined by the DEEP’s ‘measurement asset model’, which breaks digital measurements into the tools, technologies, definitions and evidence bases required to use them. Unlike the BC, which is focused on precompetitive collaborations, this model makes it possible for organisations to decide for themselves how to balance their strategies for intellectual property (IP) and collaboration.

“For example, because of the need for the industry to align, it makes sense to work together on the definition and clinical interpretation aspects of measures,” explains Langel. “Companies may then want to go ahead and create their own specific solutions and leverage their existing preferred providers and partnerships. There are probably a thousand different ways to measure heart rate, but whether you use a Suunto device to do it or a Garmin, it doesn’t matter that much [if] everyone’s agreed on the definition and clinical interpretation.”

New eyes

The DEEP is currently running a pilot collaboration between numerous large pharmaceutical and technology companies to develop a new instrument for tracking healthcare resource utilisation. “It turns out many companies have the need for new methods that go beyond the current means for generating this kind of evidence and there is great interest to see how these DEEP collaborations work,” says Langel.

Importantly, the nature of clinical development means that such collaborations need to change over time. FDA’s latest guidance on biomarker qualification, which came directly out of a BC workshop, breaks it down into five stages corresponding to the various phases of drug development. “It turns out at each of those stages, the type of expertise you need in the room is different,” explains Menetski. It’s here that his role as the BC’s “conductor” comes to the fore. He points to the ongoing Vol-PACT project for improving the imaging end points used to measure cancer progression and response to treatment, which involves three public-sector partners, nine pharmaceutical companies and three academic centres, each of which needs to step forward to offer relevant expertise at different times.

“It can be a limitation for every team, because you may not have the right clinician or scientist, or you may not have the statistician that you need,” Menetski continues. “But there’s a big difference between the recognition that I don’t have the right statistician and that I even need a statistician. We’re getting to the point now where we’re having the conversation around what’s the right group of people.”

This is helped by the fact that BC members “drive the bus” in a clearly defined precompetitive space. “FNIH is like Switzerland,” laughs Menetski. “We are the safe haven of science.” Members are “liberated” from concerns about company secrets and privacy rules, and can focus on having productive, open conversations with people who share their interests. “People feel much more comfortable talking and innovating, and really putting their minds together and being creative, because we’ve generated this safe place for them to share those ideas.”

Langel is aiming to achieve something similar with the DEEP by offering a predefined ‘menu’ of IP templates and workflows that mean researchers don’t need to stress and strategise over issues that get in the way of their clinical concerns. The marketplace is structured so organisations can tap into work already done by relevant experts or specifically recruit the expertise they’re missing, and a group of over 30 organisations, including the FDA, EMA and MHRA, have contributed to a set of ‘key service protocols’ designed to work for the full range of potential stakeholders and partners. These detail the inputs, outputs and core essence of each service required to define, prototype and validate a digital measure, as well as those necessary to generate evidence in support of its use. The protocols also work to standardise the ways components interact, so partners know where and how to hand over responsibility for different development phases and particular measurement devices can be replaced without difficulty. “There are great partners out there that do not understand our core business,” says Langel, “yet they have a lot to offer. These service protocols allow newcomers to offer specific services as part of this process. For example, a technical university can be a great partner for some prototyping work.”

That’s a particularly functional example, but Menetski is more effusive about the benefit of getting as many groups as possible to contribute to identifying and developing digital measures. As he puts it, opening yourself up to new perspectives and considerations is the beginning of good science. “If you look at it the same way all the time,” he says, “you’re not gonna get anything new.” The pandemic, of course, has changed how everyone is seeing. “I think what we’ve realised,” Menetski concludes. “Is that we know even less than we thought we didn’t know.”


Average increase in absolute error in smartwatch heart rate readings when their users are exercising.