"There's so much we don't know about rare diseases and traditional research can only partially fill the gaps. So, initiatives like applying predictive analytics to claims data, for example, can really inform us on the journeys of the patients and help us better define archetypes of care and referral pathways". Orchid Jahanshahi, Vice President, Life Sciences at ODAIA

Spoken as a part of this year's NEXT conference, Mr.Jahan's words echo the bitter truth of the medical and pharmaceutical industries: we cannot empirically know everything. As much as we hope, depend, and rely on the experimental data to lead us to some semblance of truth, some things cannot be rigorously tested. Factors ranging from ethical concerns and the lack of appropriate data points (e.g. not enough valid subjects to conduct replicable research) just make it impossible to do anything of the sort. So, is the option to give up and concede to these limitations? Absolutely not. 

When the best method falters, we ought to turn to the second-best thing: predictive modeling and extrapolation. And to make the most accurate models, we require to gather and process heaps and heaps of data. In the past, such an affair was daunting and laborious as the governing processes were mostly done manually. The advent of Big Data, however, is bound to completely revolutionize the modern approaches to data gathering and analysis. In turn, this allows us to create models of such excellence that they can sometimes exceed the benefits offered by experimental research. 

It is a given, then, that pharma would be exceptionally sensitive and receptive to Big Data's potential. The current pandemic exemplifies the worst-case scenario of being unprepared for a viral catastrophe. Please, note: we are not diminishing the sheer speed and success of the global vaccine initiative. However, we also cannot deny that it would have been much better if the world took a proactive rather than a reactive stance. An ability to understand when and how large-scale viral outbreaks can happen would allow pharma companies to start preparing countermeasures beforehand, thus minimizing the damage done to the public as well as endemic risks. In this article, we'll overview how big data impacts every step of the pharma development cycle: from initial assessments to post-launch monitoring.

Initial Assessment 

Despite all the advances, modernity is still plagued with diseases yet to be cured. Autoimmune diseases, certain genetic mutations, cancer, etc. - while scientists may understand what these diseases do to the person's body, they do not exactly understand how they act and, consequently, how to counteract them. The answer to many of these questions lies in gene expression, a domain of science that is yet to be fully understood and dissected. 

For instance, we understand that smoking cigarettes increases the risk of developing lung and mouth cancers. However, how does it happen that some people can smoke a pack a day and die peacefully in their 90s, while others will develop lung cancer from second-hand smoke? The answer lies in genes and their expression. The amount of data collected to understand the human body genome and the millions of ways that our genes can express themselves under environmental and lifestyle factors is staggering. If one were to abstract these complexities into "data point units", you could say that such research would generate hundreds and hundreds of billions of these data points. If there is a human capable of processing all of them in an old-school way, shoot us an email: we would love to see that. 

But we digress. Indeed, it is a gargantuan effort that would not be possible without Big Data algorithms and researchers are very well aware of how to put them to good use. For instance, in 2016, a group of researchers led by Michelle Lynn Hall managed to create a predictive model with over 70% accuracy. The model in question sought to become an alternative to a traditional method that targets a specific gene response. Such a paradigm, while useful, becomes obsolete in a scenario where experimenters are not exactly sure which gene expression they want to target. Their attempts become guesswork, essentially. The predictive modeling, however, opens a pathway for virtual screening and an ability to predict how certain genes will express themselves on a large scale. Thanks to the development of tools such as these, pharma companies will have a better idea of the interaction on the gene-disease axis with hopes of eventually developing a cure for all diseases that plague the 21st century. 

Modelate's predictive algorithm is another example of using Big Data for predictive modeling. Specifically, the algorithm seeks to establish the existence of Disease X - at the moment unknown pathogen that can create an epidemiological situation. The algorithm allows researchers to establish a particular set of diagnostic criteria. These criteria are extremely modular allowing the researchers to filter results by symptoms, their duration and also cross-examine the symptoms with already existing diseases. For example, if we assume that one of Disease X's symptoms is excessive sweating, we can ignore patients with pre-existing conditions that produce the same symptom to improve the predictive model. If we allow this model to crunch the numbers, and then we find out that, say, 10,000 patients exhibit similar symptoms that cannot be traced to an already identified pathogen, then we know that it is time to look further into the matters. 

clinical trials

Clinical Trials

Clinical trial is an indispensable step in any pharma development cycle. Sure, we might know that Drug X works in theory, and rats that have been injected with it seem to be doing quite fine but what about humans? Hence why companies need to run tests to compare the effectiveness of the drug as well as observe any potential side effects. As we've mentioned, the human genome is a mystery box - how one's body would react to a drug is hard to predict. The process is (relatively) straightforward for the drugs with mass-market appeal. Let's say you're designing a new formulation for asthma inhalers. Find a large group of them, split them into control (placebo) and experimental groups and observe the drug's effectiveness over a definite period of time. But what about something more obscure and time-sensitive, where finding test subjects is more challenging?

According to the MIT Technology Review's report, 9 out of 10 clinical trials fail to gather enough test subjects in the desired time frame. Let's say, a pharma company is trying to test a drug for an aggressive, rare form of cancer. Immediately, they're facing the following issues:

Shortage of test subjects;

No guarantee that all subjects will live throughout the entire process;

  • The ethical problem of making a control group out of people that desperately need the cure. 

These scenarios create a vicious loop: you cannot develop a drug because you cannot run clinical trials and you cannot run clinical trials because there is no drug, as illness will consume patients before you can draw any conclusions. In situations such as these, Big Data, once again, comes to the rescue. 

Returning to Technology Review's report, the authors observed how Celsion, in conjunction with Medidata, used Big Data to create control groups without an actual need to establish one. They establish a large database of past clinical trials and then use Big Data methodology to pluck out test subjects that fit their respective criteria. If we're talking about cancer trials, Medidata helps scan the database of over 7 million past patients to find those fitting the disease, age, and any other criteria to establish a control group out of those records. That allows Celsion to expedite all clinical trials as they can almost instantaneously match the experimental group with any number and variety of subjects for the hypothetical control group. 


Drug manufacturing, likewise, is a complicated process with many variables that must be accounted for. If left unchecked, the manufacturer risks making their drug either completely unusable or, in the worst-case scenario, harmful to the consumers. A few years ago, Merck & Co. had run into an issue with the production of their vaccines. The precarious conditions in which vaccines must be maintained at all stages of production were falling somewhere. But where, exactly, they could not pinpoint. They had multiple sources of data to draw their conclusion from: the batch tracking system, the plant maintenance system, the building-management system, etc. All these provided copious amounts of data but in a compartmentalized fashion. Synthesizing data from such a myriad of sources was a time-consuming endeavor, and the conventional methods allowed Merck & Co to parse only a couple of batches at a time. Given the scale of loss that the company has been experienced at that moment, it was necessary to devise a much better solution. 

Using Big Data paradigms, Merck & Co. managed to synthesize all their data sources into one cohesive whole without spending time and effort on the manual transposition of the data. The results were staggering: barely in 2 months, the system they've set up managed to match all the data involved in vaccine production (batch ID, equipment ID, time stamps, temperature parameters, etc.) to yield over 5 million batch comparisons. Such a rapid process allowed them to determine that the issue was at the stage of vaccine fermentation. Solving this issue saved the company millions of dollars. 

That is but one example of how Big Data can improve pharma's manufacturing process. Although other drugs might not be as sensitive to the multiple variables as vaccines, manufacturing is still a sensitive ordeal where avoiding errors is paramount. Take something as ubiquitous and essential as Aspirin. The annual production of the drug is ~40,000 tonnes. Imagine if one of the plants had an unknown malfunction somewhere that botched numerous batches. Doesn't seem like a scale where you can determine the error manually, does it? Big Data does not just save money for the companies, but it makes the whole process of manufacturing safer and more efficient.


You're almost there. Initial assessments, trials, manufacturing - all done. Now it's time to distribute your new beautiful research baby across the nation or the world. But, distribute how exactly? There is a reason why people in Alaska aren't too keen to buy snow off you. It’s a thing called supply and demand, you see. And, yeah, if you're producing Aspirine, perhaps you don’t need complex algorithms to min-max its allocation across multiple regions. Sure, people from Wyoming might use 5% more Aspirine than people from Oklahoma but it is safe to say that everyone needs Aspirine. What about something more niche or even something as sensitive or scarce as a new vaccine? That's where Big Data can help. Its number-crunching magic can reveal hotspots that would immensely benefit from the surplus of a particular drug and, conversely, cold zones where the demand will be low. 

pharma market access

Modelate's recently developed mathematical algorithm is an example of how a pharma company can leverage Big Data to assess its distribution needs. Let's say we're discussing the United States and the vaccine coverage across states. By feeding data in the model, we can find states with an exceptionally low % of people vaccinated against novel Rotaviruses. And if the disease strikes? Well, safe to say that such a state would become an endemic hotbed, that would need not only all vaccinations that it can get but also a variety of remedies for those already afflicted by it. Thankfully, you don't need to guess how much of either you'll need, as Modelate's algorithm allows you to model scenarios with various degrees of viral outbreaks. Thus, you would demand worst and best-case scenarios. 

Wrapping it All Up

Big Data sounds awesome, but we don't want to peddle it as a miracle cure to all problems in the pharma industry. That would have been too good to be true. Consider for a moment, that a bulk of Big Data paradigms revolve around predictions and modeling, which means that it is educated guesswork. Granted, it is guesswork backed by terabytes of data that no human can manually process, but the models that drive their analysis are still designed and verified by humans. Errors are bound to happen, so it's better to make sure that the models and algorithms you're using are the best on the market as it would be kinda difficult to double-check calculations by hand. 

Another threat posed by Big Data adoption is the increased risk and damage posed by cyber-attacks. Data stored on paper might not be the most convenient but at least it is safe from hackers. Using digital data, conversely, is like drawing a large target on your back that hackers are more than happy to cop. Especially with information as lucrative as that usually stored by healthcare and pharma companies. Accenture's 2017 study suggested that healthcare companies are more likely than others to become targets of cyber attacks, so make sure that you're bolstering your Big Data endeavors with equally potent cyber security. 

That said, if you follow these precautions, Big Data will become an indispensable tool in your arsenal, allowing you to save money and distribute resources to places where they are needed the most. 

Back to other articles

Contact us

Value Communication Platform

Ready to get started?
Please give us a brief description of your project, and we will get back to you immediately.

Would you like see VCP in action? Simply enter your data and we will contact you immediately.