Posted in

Recipes for Natural Language Processing: A Guide to Solving Problems using Clinical Text



Presented by David Cronkite and Will Bowers on April 2nd, 2024
Trancscript:
Chloe Krakauer:
On behalf of my fellow similar coordinators, Maricela, Leah, and Aisha, thanks so much for joining today to listen to our second scientific seminar of the 2024 series.
I’m honored to be introducing 2 speakers, David Cronkite and Will Bowers, who both work as computational linguists here at KPWHRI.
David’s prior and current work applies natural language processing, or NLP, across multiple scientific areas, including mental health, COVID-19, and neuropathy. His interests include scalable, portable, and reusable solutions in addition to exploring applications of machine learning in resource constrained environments.
Will’s work also uses NLP, linguistics and machine learning to extract information from medical records with prior and current projects, including the Adult Changes in Thought project, FDA Sentinel Initiative, and a project studying substance abuse.
Today, Will and Dave will be providing an overview of how they leverage natural language processing to unlock the largely untapped resource of medical records to use in health research.
Their overview will include defining NLP, how it’s uniquely employed at KPWHRI, and provide ideas for future projects that leverage NLP. I myself am lucky enough to use data that David and Will extract from massive databases using NLP for the Eye ACT study, and I’m very eager to hear more about the process.
Certain questions addressed in those using those data would be impossible or incredibly expensive to address without their work.
David received both his degrees, Bachelors in linguistics and history and Masters in computational linguistics from UW. Will also receive both his Bachelors in science and informatics and Masters of Science in computational linguistics from UW.
The total time for the talk today will be around 40 minutes, with remaining time at the end for questions. Will and David will ask at a certain point during the talk if there are questions, but otherwise ask that folks hold on to their questions until the end to be mindful of time.
The seminar will also be recorded if you’d like to refer back to anything that’s presented today.
Thank you so much to both David and Will for agreeing to speak today and feel free to start whenever you’re ready.

David Cronkite:
Let me go ahead and share my screen.
Thank you very much for that, Chloe, Melissa.
OK. So today, Will and I are gonna introduce some recipes for NLP and without further ado, let’s go ahead and get started.
So brief overview, I will introduce what exactly is natural language processing.
We’ll describe how it works.
We’ll introduce the lifecycle of an NLP project as well as some of the advantages and challenges here at Kaiser Washington and kind of highlight some current and future work that we’re looking forward to or doing and looking forward to the first the big question, what exactly is natural language processing?
Well, NLP and well, it captures a broad range of applications that essentially boils down to enabling computers to make use of human language.
Some of the popular applications now are everywhere, from your Google search, which will convert some sort of natural language search query into a set of website results.
We have chat GPT which will turn some prompts into text.
There’s machine translation.
There’s a whole wide range of applications and we live somewhere there, off to the right, doing our particular brewing, our particular brand. If we zoom in on exactly what we’re doing here, we take advantage of clinical texts and so, why invest so much in NLP?
Well, we have a very rich clinical text data that includes a whole bunch of information that by extracting and leveraging that we can support various research efforts by filling in incomplete or missing structured data.
And so what this will basically look like is we take some input notes, rerun it through NLP and generate some sort of structured data.
We make sense of the data for for particular questions.
As I mentioned about richness of notes.
When you look at any particular clinical note, there’s a whole bunch of different questions you can ask.
You can ask questions about looking at an individual’s level of social support.
So in this particular note, there’s language of being accompanied by daughter.
We might ask questions about family histories.
So family history, section father’s, diabetic, we might ask questions about side effects or about alcohol consumption or various other factors.
And all of these and NLP, is able to try to harness and make use of and present that for subsequent analysis and to answer various research questions.
So how exactly does that work?
Well, we now start with a simple question. Let’s say we have a set of patients and we want to ask the question about how many of them have glaucoma.
Umm, so we’ll look at a couple notes.
And so we have some notes here on the left.
The first one says patient has glaucoma. And so our first basic NLP system will be a recipe for identifying glaucoma.
And we’re just gonna search for the word glaucoma.
And for this particular note that works, there is added complexity here, however, so as we look through our examples, we may run into this one.
Where patient, doesn’t have glaucoma.
Well, that that’s unfortunate given our first rule.
So we’re gonna have to add a second one, which says looks for an excluding “doesn’t” or “not”.
And so once we’ve added that rule as well, we can go ahead and now see that we filled out the chart on the right that says the first, the first individual has glaucoma.
The second one does not.
We then move on to a third chart.
This one says with a family history of glaucoma.
OK.
Family history wasn’t in our set, so we have to have to exclude that as well and add that to our recipe.
And as we keep looking through notes, we’ll see a certain complexity arise about all the different ways in which things can be described.
So in addition to a certain richness and information that it leverages, it can be very complicated and take a bit of work.
So what are some recipes that we use to address these?
We’re gonna go through three and there is some overlap to here, but I think the general impression presents several different approaches and why we might want to use one approach over another.
This will be more a little bit more theoretical and then we’ll get into some practical examples after this.
So first of all, we can begin with our rule based solutions.
Our rule based recipes and these are gonna be essentially manually developing patterns to identify or extract some sort of target information, and we’ll attempt to use something like a regular expression.
And for the context of this, the regular expression or regex is a language that allows advanced string searching, allows repetitions, allows intervening words, things of that nature, and so we have a regular expression showing up here on the left hand side, separated by vertical bars.
We have the words good, sufficient, strong and extensive, and in this case, we’re gonna be looking for an indicator of strong social support or good social support.
And so all of those, the vertical bars indicate that there’s some sort of alternation going on.
We can pick any of those words in there.
If any of those is found, it will return a match and on the right hand side.
After this this slash W star which basically says you can throw some stuff in between there, we’re gonna look for either support system or social support.
And so we have two pieces of input text.
One is she has strong social support.
The strong matches from the first part of the regular expression and social support matches from the 2nd and in a second sentence we have good support system and good matches from the left hand side and support system matches from the right.
So our regular expression will find will match both of the text and say yes this person and we can then use that subsequently to say this person has an indication of strong social support or good social support.
Given the medical record, you can imagine extending this like we would want to as the example before, we might want to include negation.
She does not have strong social support or something along those lines.
But this gives a basic idea now beyond just this very surface level, we’re just looking at the letters or characters within a note.
We can extend the rule based approach to include certain meta information about the note.
Maybe we can look for sentences.
Maybe we can look for sections.
What sections things are in, exclude particular sections, try to include other ones.
We can look at the role of these words in a sentence.
Look at their part of speech or other syntactic features.
So let’s take an example of a sentence here.
Patient is coughing, but there’s no evidence of pneumonia.
Umm, that’s the the plaintext we you can go ahead and use that and that’s what we did.
A regular expression, but some of these words are not umm, they’re not normalized.
So like patient is capitalized because it’s the first word in sentence.
Coughing has this ING ending, but it kind of has the same meaning as coughs or coughed, or just plain cough.
And so one form of normalization is called is taking the lemma of each of these words, or essentially the stem or the root and finding some sort of standard form in which these all occur so patient gets lower?
All words would get lower cased.
Words like coughing gets reduced to cough so it looks the same as coughs, which simplifies the rule creation.
Umm, you also see “is” and the apostrophe S in “there’s” both get translated to “be” which allows simple looking because they’re both the same word, they’re both “is”, they’re just have different surface forms.
We can also look at part of speech.
It’s whether an item in the sentence is a noun, a verb, or has a different role in the sentence.
Building these part of speech tags also allows us to try to look at connections between words and how different words relate to each other so that if the word “no” appears like no in no evidence of pneumonia, we can try to figure out what the no is actually referring to.
It’s not referring to coughing, it’s referring to pneumonia, that there’s no evidence of that.
And finally, this is another example of of some of this information is we can look at stop, try to exclude stop words.
So stop words are words that because of high frequency or relatively low semantic use like don’t have a lot of meaning to them.
Umm can be easily excluded because they really don’t add any value or any new information to the notes or the sentence and those would be like “is” “of” “and” “the” these articles.
All of those tend not to have it.
The only one we might wanna keep in this particular example, we may want to keep the no, because that probably has some significance.
For our use case now a general rule based life cycle will focus on eventual beginning with rule development using a corpus.
We’ll then go ahead and build a set of rules to run through notes, generate outputs, and then refine the rules over time based on the performances.
Why exactly might we want to choose a rule based recipe?
Well, first of all, it requires minimal training data, so you can kind of get started right away.
You just need a little bit of data to look through. Identify some patterns and build regular expressions.
It’s also very easily interpretable.
You can point to exactly why something has produced a particular output.
There’s a regular expression or some other reasoning behind it.
They’re computationally very efficient.
They tend towards higher precision or positive predictive value and they are deterministic.
They will always produce the same outputs somewhere.
Challenges, however, are they do require manual rule development and as rules become more complex and layered together, that can take a little bit of effort to maintain.
There’s also a huge effort in making things more flexible, so if you notice, if you look, think back to our previous example of looking for indicators of good social support.
We didn’t have the word excellent in there and so excellent will not be found until we go back and add that there’s no sense of semantic similarities or if there’s a misspelling that also won’t be picked up.
We’re limited really to the surface representations, not any sort of deeper layers of semantics.
And as with the excellent example, it’s not gonna generalize very well.
You can’t find new words. You have to introduce those and write the rules for them.
Umm, so then we can turn to a second recipe which is classical machine learning, and in this case we’re gonna use an algorithm to infer the pattern.
So rather than hand coding all these, we’re going to. Umm I give an algorithm the set of answers so label training data which say like in our question like if our if our question was is the patient coughing, one for yes, zero for no and then on the other hand that’s associated with a particular note or maybe a sentence some sort of layer of text and those will be associated together.
We can then pass those to a model which will try to go ahead and create rules based on that, and then that model can be used to classify our corpus and generate the needed output for whatever research study.
So what? So how exactly does this work?
Well, let’s think we have have our example from before.
Patient is coughing, but there’s no evidence of pneumonia.
OK, so we’ve had a research specialist go through and check and say yes, patient is coughing.
So we have the check box, but how do we train the machine on that because it doesn’t know how to interpret these words.
It doesn’t know what that is.
We have to transform it or provide a view of what this might look like.
So first of all, we could do what are called unigrams or.
This is often called a bag of words approach, or we’re just gonna take each individual word.
In this case, we’ve normalized it by lower casing them, and we’re just gonna throw them to those all there and say if you find particular words together, you can go ahead and make some sort of conclusion.
Use all these words to make patterns with it.
Umm, we may wanna normalize further through taking the lemmas.
So in that case the “is” gets transformed into “be” and you’ll notice both “be’s” right every time the word reappears, it gets the same numeric representation, which in this case is 1.
You’ll also notice coughing gets reduced to cough, so it will have the same feature.
0:14:9.60 –> 0:14:20.380
David Cronkite
Will have the same representation to the algorithm as if the word coughs appeared or coughed later on. But we still don’t have order.
So if we wanted to retain some semblance of order of words, we can use something like a bigram.
In this case, I’ve dropped the stop words because one of the challenges is as you try to increase the length of all these, of your, engrams, you’re so you moved to bigrams, which are two words together, trigrams 3 and keep working your way up, the feature space, the number of possible combinations you can have, increases because there’s a lot more pairs unique pairs, than there are unique words.
And so here we can do our bigrams.
We’ve dropped the stop words here so that patient is coughing just becomes patient coughing and we’re able to retain that information and supply that.
So if we think there’s information.
So for example, maybe the name of an organization like Kaiser Permanente, we may want those words together as opposed to those being on separate parts or the order not being considered.
We might also use keywords or regular expressions as features, and we can include other information as well like location and the note sections we’re in.
There’s a lot of different options of what to include. So why would we take a classical machine learning approach?
Well, in general, they’re more generalizable and they’ll be able to infer the patterns from the data.
They’ll be able to make use of a broader array of patterns, because they can think in a more complex way than we can, because they’re just looking for patterns within the data using a certain feature set. And they’ll typically have better accuracy than a rule based. On the challenges with these is you tend to need a lot of high quality data and a lot more data than you would with other approaches.
You need the effort in labeling the data. Unless you’re fortunate enough to already have it come pre labeled from some existing process. Interpretability can be difficult as well, with most of these systems where there’s not a one to one correspondence with a particular rule, or even making sense of what a rule might be and the output, and they will typically be more computationally expensive both to train and to run.
And finally, our final recipe we’ll look at is a deep learning and in this case this is what would fall within the realm of like Chat GPT, as as things are popular, or some other models and typically what you’ll do with these, you’ll take a pre trained model which already has some embedded semantic representations.
And these pre trained models are built on a neural network architecture which the way to think about.
I’ve kind of included a diagram down in the lower right of the screen you take as input.
Let’s say you wanted to identify certain things in an image.
Here we have a star. I see star and some sort of shell and it allows certain number of hidden layers which are trying to pick up relationships between all of the different pixels in an image.
Or in our cases, the words and the meanings of all these different words together.
And if you add enough hidden layers, you allowed this some sort of embedded semantic representation is able to take form essentially, and then you’ll go ahead and fine tune this model for your particular task and so you take these pre trained models so you don’t have to train them.
But what they do have is they tend to have, umm, they will group.
Similar words will be clustered together based on the context in which the word appears, so you can think pneumonia and tuberculosis will appear in the same relative context as opposed to patient and person. And so those will appear in whatever space you have, in this case, on an XY axis. Certain ones will be with similar meanings because they appear in similar context in sentences, willbe grouped together. And the way I like to think about this is from the work of Yao et al., colleagues from 2017 where they looked at New York Times articles between 1990 and 2016 and they looked at 4 terms, brand names and people, and looked at what the word was associated with through time.
So let’s walk through this together.
If we start on the left hand side, we’re looking at the word Apple and in the early 90s the word Apple was usually associated with apple pie.
So it’s in the neighborhood of strawberry and pear.
Other fruits it’s also in the same neighborhood as ice cream.
Over time as we move on as as there’s a brand, Apple becomes more common and it’s talked about within the New York Times.
Its context changes and shifts, so it’s in the neighborhood of Google, Microsoft, Samsung tablet servers.
This more tech space and its use, and if apple pie has decreased, we might say that we can see the same thing with Amazon, where it begins in the Peruvian jungles and Brazilian forests and slowly works its way towards eBay, Walmart, Yahoo, ecommerce, Umm, I and eventually to somewhere between Netflix and Walmart.
And that’s just based on how these words are used in the New York Times over time.
Obama we see up till 2006 kind of in a more university setting and then moves through the late I guess from 2007 on it seems to play a more political.
That word plays a more political role.
As does Trump, moving from the language the area of like owner estate and project development over to being part of a party or joining news interview television.
Clinton, Obama into those spaces.
So deep learning, why would we use it?
Uh, well, it tends to be more adaptable, as I was just showing.
Words already have an embedded semantic meaning, so that means if you remember our example from what we’re looking for social support and we had a list of good and some other words associated with good.
We didn’t have excellent in there. With a deep learning method excellent would already have a similar embedding to good and so it would already be incorporated. As new words maybe get used, the meaning is retained, so you’re actually building an algorithm based on meaning. They tend to be more accurate, more robust.
The actual fine tuning doesn’t require that much training data.
And it has already learned in some sense how words in sentences, how words in human language work.
Umm, if you wanted to train them from scratch, they do require a huge amount of data.
They’re not interpretable, they’re nondeterministic, and by not interpretable I don’t mean that you don’t get an output that’s useful, just that you can’t interpret how it has made a particular decision. And they tend to be very, very computationally expensive.
So with that, I can pause briefly if there were any questions as we as we switch presenters.
And we can also hold questions at the end as well.
Will Bowers:
Well, I’ll continue on then.
Thank you so much for walking through some recipes, David.
And so now that we know kind of how we’re using natural language processing to make these delicious dishes, let’s take a look at some example dishes or some example projects that David and I have done here at the Institute.
So the first one is going to be Eye ACT, as Chloe mentioned in the introduction, ACT here is again, adult changes in thought and Eye ACT is principally concerned with understanding the aging brain through the aging eye.
More specifically, we’re investigating Alzheimer’s disease related ophthalmic biomarkers.
And what the NLP is specifically doing here is trying to look at an electronic medical record which has a lot of this information around ocular pathology encoded as free text and tried to extract that into a structured usable format for the biostatistics team to take advantage of and use for modeling.
And so specifically, we’ve built a rule based pipeline.
So that’s the first recipe that David talked about, largely relying on regular expressions to extract over 100 variables. These variables are things like acute macular degeneration, glaucoma, as mentioned previously, diabetic retinopathy, and a lot of related features.
This approach requires a lot of collaboration between myself, our research specialists, Chantelle Hess and our biostatistics team.
So working with Chantelle, I might know natural language processing and linguistics, but I really don’t know much about the eye or eye health.
And so I work closely with Chantelle to understand the ways in which the different pathology we’re looking at might be represented in a note.
So a great example of this is diabetic retinopathy is oftentimes abbreviated as Dr, but also doctor is abbreviated quite frequently as Dr.
And so I work with Chantelle to figure out which of these cases are talking about doctor and which are talking about diabetic retinopathy.
I also work with our Biostats team to understand if on the output level the prevalences on a patient level are what we expect them to be.
So along with the natural language processing pipeline, we have a team of Research Specialists doing paper chart abstraction.
And so we’re comparing those prevalences of let’s say diabetic retinopathy to see, OK, is this something we would expect?
And so we’re adjusting our NLP approach both on the kind of tactical level as well as that person level prevalence level.
And So what the NLP is doing here is again producing structured quantitative data for the biostatistical modeling.
So let’s take a look and walk through what we’re doing with the NLP.
So on the left here I apologize.
It’s a little cramped and locked going on here, but it’s an example of a note we might look at.
So towards the top of the note, we have some information talking about visual acuity or how well this person is seeing. In the middle, we have information about tonometry, the macula, Irma CCMT just different features of the eye, and then finally on the bottom, we have mentions of the family history of this particular patient.
So we will apply the natural language processing pipeline and we can see over on the right now we have a bunch of different parts of this note are highlighted.
So think of these as areas of the note where we’ve created a regular expression pattern and we’ve now extracted this information.
So this alone isn’t quite usable for our Biostats team.
We need to then translate these bits of text that we’ve extracted into a table.
So this is now a table of structured data of quantitative data that we can go and use.
And so let’s take a look at this table briefly.
We have different types of variables here.
Next to the dolphin in both the note and the table, we see that there’s numerical values, so relating to the visual acuity, we can see that there’s values of 20 or 40 or.
Here we have a missing value and so that gets encoded as a -, 1.
We also have examples of categorical variables, so the description of the Irma is as mild moderate.
We capture that as moderate and we can see that that’s captured in the table over here and then finally looking at the family history where the cactuses are, the cacti pardon, we can see that glaucoma no or AMD yes, gets captured as a binary variable, and so that’s just a simple yes or no.
Or as David said earlier, a zero or a one or a one or a zero to be consistent.
Another example of a project or a rule based project that I’ve worked on is again with the Adult Changes in Thought study.
But this is Project 2, Aim 3 and Project 2 Aim 3 is concerned with understanding the lived experiences of people with dementia. So the difference here between Eye ACT is that the NLP here isn’t trying to produce quantitative data.
We’re trying to curate a sub corpus for our team of medical anthropologists to use and perform their analyses.
They’re concerned with understanding things like patient initiated communication, housing, caregiving, and so they’re going to look at the actual text itself, not at a chart, and try to answer their questions using their tool, Alice.
And so for this project, we have a cohort and that cohort of ACT participants has about 160,000 notes, umm, and just a corpus is a collection of notes.
And so these notes, first of all, there’s way too many to be able to do meaningful analysis with just a team of five and a reasonable amount of time.
And another feature is that a lot of these notes are going to be irrelevant to answering those research questions.
And so how can we use NLP to distill or filter through and figure out what are going to be helpful and relevant to answering their questions?
And we’re hoping to get a sub corpus of about 500 notes.
So a big task filtering down.
So we principally do this in two steps.
The first step is a lot like the work we do for Eye ACT, so we’re gonna be using a rule based algorithm, largely regular expressions based.
It’s called pyTAKES, actually developed by David to go ahead and extract concepts from these notes.
We’re hoping to enrich these notes with a better understanding of what they’re talking about.
So let’s look at an example.
Here we have a little snippet and it says would walk up to two to three miles with friends.
So let’s go ahead and run pyTAKES and then look and see that we’ve extracted the concept of social support and if you’ll notice in the note here, we’ve highlighted the term friends.
And so a little bit of how this was working is there’s specific concept text or terms or phrases rather that we’re hoping to identify and associate with particular concepts.
So in this instance we have a concept of social support being associated with friends.
There’s many other concepts we’re looking at, so concepts around the patients point of view, their transportation, exercise, caregiving, housing, all sorts of concepts that we’re hoping are going to be relevant for the qualitative analysis.
And one note on these concepts is that they’re actually leveraging prior work prior codes from a medical anthropologist perspective.
And so I’ve worked with our team to iterate these concepts to say OK, is this bit of text that’s being returned actually being associated with this note. A great example of why this might be useful is the term pleased.
So pleased and the regular expression can get converted to please and please is going to occur all the time in instructions to patients and that’s not what we’re hoping for.
We’re hoping to understand, OK. Is the patient pleased?
So let’s move on over to the next step.
At this point, after the first step, we still have that big corpus of 160,000 notes, but now it’s enriched with these concepts.
So how do we use these concepts to distill that 160,000 down to the 500 notes?
Well, let’s consider these two nodes.
This snippet here says reviewed allergies and medications with the patients. Well, that’s not really interesting to us in this instance and on the right here we have would walk up to two to three miles with friends.
And as we saw in the first step, we now have the concept of social support extracted.
And so we can use that to say, OK, let’s go ahead and exclude this first note and include this second note.
This is a simplistic view of what we’re doing.
We’re looking at other things like note length and number of concepts extracted and specific concepts, and all of that, but I think this is a great example of how we’re using those concepts to distill our corpus.
I’m going to hand it back over to David to talk about polyneuropathy.
David Cronkite:
Yeah.
So another of our classic recipe or classic dishes I guess has been with polyneuropathy and the context for this particular work was that polyneuropathy can can be … it takes a little bit of effort to diagnose and so it often remains un-diagnosed.
So back in the Group Health days, the actuary underwriting group had an interest in, yeah, I trying to identify and diagnose polyneuropathy.
And so they had an existing system where they would take a set of notes or or would rather kind of patients in some sense, but think of it as a set of notes and select notes with neuropathy.
And then they were manually reviewed by physicians or clinicians and the number of we’re in the range of like 400, I think, individuals would get reviewed and those would be passed on the possible polyneuropathy cases would then be passed on to professional coders to actually evaluate and decide if a code should be applied there.
And the question was, can we speed up this manual review?
Can we can we leverage technology?
Can we leverage NLP to help add efficiency there?
And the answer was we tried and the answer was yes.
We work with clinicians, first of all to identify terminology associated with polyneuropathy, and you can kind of see that on the right here.
We looked at a huge number of keywords based on our conversations and interviews with them, including hand numbness, numbness of the toe, numb forearm, paresthesia foot, burning sensation and so on, and so on.
And these were converted into regular expressions, and we built an NLP system to first of all extract these terms and we would use these as features.
So if you remember from the where we discussed the classical school with the machine learning approach, one of the ways in which you could slide features was based on regular expressions.
So we used these identified features on them and then trained a classical machine learner to predict whether an individual was likely to have what would have been passed on to coders before.
So basically had if they had likely had polyneuropathy replicate that process now for training data it was very nice because we had all the historical data of the previous decisions that have been made by these clinicians.
So the NLP was able to take those input neuropathy notes, identify neuropathy and then reduce the number of potential patients for the clinicians to review from 400 to less than 100 with very high sensitivity such that I think I’m not cause we didn’t have the answers in the coders, we didn’t know at least in the data set that we had available what the final decision was.
But we were able to significantly reduce their workload without really any effect on sensitivity.
So another project has been working on identifying social support within, umm, the notes of breast cancer patients since, uh, higher social support is associated with better outcomes.
The goal was to try to do some sort of intervention to support those with lower levels of social support.
So our first approach was doing a rule based regular expression algorithm regular expression based algorithm to identify a set of categories like if they had children like deceased family.
If there was explicit mention of social support.
That’s where the social support example comes from.
Partner, what their living situation was if there was conflict or stress.
If they had transportation issues and so forth, and so there was a battery of this work.
And so that’s being used right now and is an input to a social support score.
We had a question.
There was an interest in like, hey, can we leverage technology to do more with this?
So on the next slide, we’ll see that the question was asked.
Hey, can we can we improve this rule based algorithm by using it as training data for a deep learning model.
So we were fine tuning a deep learning model called BioClinical BERT.
And what this would take as input is it takes those all the output of our rule based system and the goal is that it was to leverage all of those the semantic associations that are built into it because it’s already been pre trained, already understands how it works work and it understands a lot.
Even the in the medical terminology and how those will work.
So the question was, can we leverage this system?
And if the answer is yes, because this is actually still onto our evaluation of, this is still ongoing, but it looks promising so far.
We could then use the same approach for future work, so after we’ll based algorithm is developed, we can try to extend it and see if we’re missing anything using by fine tuning a deep learning model without too much overhead.
Will Bowers:
Awesome.
So now that we’ve taken a look at some example projects or dishes that we’ve made, let’s take a step back and understand the life cycle of an NLP project.
So how do we transform those ingredients through a recipe into a delicious dish or usable output?
And so it really starts with the project getting funded, at which point David and I are brought on to a team or both of us and we’re explained what we’re hoping to accomplish with NLP.
The first step that David and I are going to do on any project is assembling our corpus again.
The corpus is just the collection of notes for electronic medical records.
Natural language processing relies on natural language, and so we need that to be able to do any sort of extraction of information.
The next step is going to be developing the algorithm.
This part is going to be where we spend the most work on any project and this is going to vary widely based on what recipe we are using.
So if this is a rule based project, this is going to be creating the rules, refining them with whoever we have on our team.
If this is a machine learning or deep learning project, this is going to look like tokenization or creating those unigrams bigrams.
Any sort of structured numerical representation, and so again this is going to vary a lot based on what sort of recipe we’re using.
So let’s take both the corpus and the algorithm and run the algorithm on the corpus and with that, it’s going to produce an output and as much as I’d like to think that on a very first iteration we’re gonna get this right, we usually are not.
And so we need to evaluate that output and think is this addressing what we’re hoping it addresses?
Is this extracting the information we want it to?
Usually again, it doesn’t happen on the first try, so we need to go take that output and change either the corpus or the algorithm, and a lot of cases we don’t need to assemble a new corpus, but we almost certainly are going to be tweaking that algorithm.
And so with the rule based approach, this might be altering the rules.
Changing the rules with the machine learning approach, this could be using different settings for the algorithm.
This could be encoding it in a different manner, so this process is going to happen a number of times.
For example, with project two and three with ACT, we iterated 6 times before we had an output from pyTAKES that we thought was satisfactory.
So once we’ve iterated enough times and we look at the output and we think, OK, this is what we’re hoping to extract, then we prepare it for handoff.
So this could be transforming the the data into a usable format and answering any questions that are stakeholders might have and just, ensuring a smooth transition with this information and make sure that they are ready to go with it.
Umm, so now I want to talk about some strengths and challenges of doing NLP at KPWHRI.
I think I have a unique perspective on this because prior to coming into research at the institute I worked at both a health insurer as well as a health tech startup doing NLP in both places.
And so I kind of have this understanding of how NLP might be done at different shops.
I think the best or one of the greatest strengths of doing NLP here is our direct access to ECHO or the electronic health record.
I remember at a startup we paid 10s of thousands of dollars to access claims information.
That is a lot of money for sort of a glance at that true data as opposed to getting the note directly, and so the claims is kind of it’s not the truest sense of this data.
And so being able to just query or assemble a corpus whenever I want and to either be approved of course, and to look at that information and iterate directly without any steps in between is such a great strength.
Umm the image down below, our team of chefs.
I think that there’s a wealth of information here at the Institute.
As alluded to earlier, I know natural language processing and machine learning, but I don’t know a lot of these Health domains that we’re deploying solutions for, and so it’s awesome to be able to work with investigators, research specialists, programmers, biostats, you name it, to be able to get their understanding of our approach and be able to and real time adjust as we’re moving forward.
And that leads directly into our collaborative environment.
So I don’t need to work in isolation for months and just hope that I’m working towards the right solution.
We have regular check-ins with our full team where it’s an interdisciplinary environment where we can all work towards help each other and get that understanding of OK are the things we’re pushing forward actually moving forward our understanding of what we’re hoping to look at.
And then finally, as kind of represented by Remy and Linguine’s hair, David and I have the independence to experiment with innovative approaches.
We have a lot of latitude to develop these solutions in the way we see fit.
We don’t have a manager or director telling us exactly what we need to do and this gives us the flexibility to look at what is the current research in the natural language processing mill space and incorporate that to make sure that our solutions are as adequate and as great as possible and another strength of this is that as we use these tools through different projects, for example, again pyTAKES was used countless times.
By this point, our tools only improve as we move forward, so I can’t mention strengths without some challenges.
I think the biggest challenge here at KPWHRI is a low computing environment.
So David and I don’t have access to graphical processing units or computing clusters or cloud infrastructure.
These types of things are very common at other shops, and essentially that’s just limiting the number of iterations that we can do because although we do have a fairly strong virtual machines, they’re certainly not on the level of like a GPU or a computing cluster.
And so again, that just limits the number of iterations and we might not get as accurate or as an efficient a result as somewhere else.
Another challenge is that David or I own all parts of the NLP life cycle, so that means we’re not able to divide and conquer if we have multiple things we need to work on in a given project.
Say for example I need to both make the algorithm more accurate as well as deploy the tool.
I need to prioritize and choose which one I work on.
I can’t work on both at the same time.
How great of example of this is a recent project?
Well, wrapping up where we’re hoping to deploy a natural language processing, specifically a large language model, power chart abstraction tool.
And so I needed to both deploy the tool in our computer environment, which was hard because we couldn’t containerize our solution and the infrastructure wasn’t what it was that our partner site.
And then also because I was focused on deploying, I couldn’t divide and conquer and work on making sure that the large language model was as accurate as possible applied to our data.
The final challenge is that notes are complicated.
This is not something that’s gonna change anytime soon, so it’s something we have to work with.
But a note structure and style varies widely by author and even the same author which is provider.
So nurse.
Doctor, that same right author is going to have a different style change over time and also notes are going to vary widely based on specialty or location.
So an ophthalmology note, it’s going to look a lot different than an oncology note, et cetera.
And then another challenge with complicated notes is the hypothetical language, or qualified or hedging language.
David talked about this in the initial example, so let’s look at glaucoma.
Again, it’s not commonly just yes or no with absolute certainty.
Yes or no.
Glaucoma.
Lot of times it’s possible glaucoma or risk of glaucoma or history of glaucoma.
And so there’s a lot of nuance there and it’s really important for us to capture that nuance accurately because history of glaucoma actually is oftentimes, no, they don’t currently have glaucoma.
So looking forward, David and I have many ongoing projects, a lot of them we talked about today, but some possible future work we’d like to talk about.
So last Friday, actually we just submitted a small grant proposal concerned with extracting and classifying discussions of self harm.
So these are often kind encoded in a note, but it’s not captured as structured data and so first we want to extract that information and then classify that discussion of self harm to see.
Is this a novel or follow up discussion?
Another example is a supplement that Yates Coley is submitting, looking at identifying expressions of sexual orientation and gender identity, and this is going to be used to improve suicide risk prediction.
And finally, David has work looking at exploring the FDA’s ability to have sites in its Sentinel initiative include natural language processing.
NLP data is not as easy to encode into a structured format, and so there’s a lot of work in terms of how do we standardize this process across sites to encode this.
So thank you so much.
If you have any questions about potential NLP use cases, we’re happy to chat now in our remaining time, we also please reach out via email or schedule call.
These questions can be about recipes.
About example projects about future dishes you’re hoping to make, so any way that you think you can leverage NLP to it in your research.
So thank you very much for attending and we’re happy to answer questions now.
Chloe A Krakauer:
When I send a massive thank you to both of you for taking this very complicated concept and distilling it to something that’s really understandable, I know I’m gonna be using this as reference material and suggesting other people use it too.
This is awesome and so great to learn about this awesome resource that you two provide.
I’m starting with questions if people want to start raising their hands, I can call them people on order.
I did see there’s a question from someone in the chat that was have you published pipelines?
That were abstract enough to be reproduced other places KP or not.
David Cronkite:
Yeah, this is one thing.
But early in my time here, if I understand it correctly, like deploying, basically making the algorithms portable, this is one of the early frustration with a lot of NLP solutions is even ones that are published are just very siloed, or they don’t.
They don’t work when you try to demo them, and that’s still common, especially in the clinical space, even ones that are published.
We’ve released a number and most of the ones we try to design are designed such that they would work at a different site and allow some sort of implementation.
So even the social support one that’s been deployed at Northern Cal, we’ve another member of mental health NLP systems that have been developed at either here or using both sides that such that we can get training data from both sides.
Abstract kind of generalize that and then build a shared algorithm that would work at be deployed to both sides, and then there’s also some current work on a machine learning solution using both sides and with tokenized data.
Chloe A Krakauer:
Often thinking and can you apologize my take myself off mute if people can hear background noise?
My paper is doing construction, but I can hand over to Nicole.
Nicole M Gatto
OK.
Ohh thanks David and well for such a great presentation.
I was curious and this might seem like kind of an odd question, but to adjust your last point about the complexity of writing, has there been any talk about taking learnings back to clinicians to retrain the way they talk and write to make the records easier for NLP’s to read?
David Cronkite:
I have not heard it for the specific case of NLP, although that is that is a common joke.
If you get a whole bunch of clinical NLP people together, you know, but essentially it’s great, great structured data for all people.
Umm, I think there is a movement towards like increased standardization, but that’s not necessarily for the purpose of making it easier to extract the information.
It’s not that I’m aware of.
Chloe A Krakauer:
Thank you.
And over to Brian with question.
Brian D Williamson:
Thanks, Chloe, and thanks, David.
Well, this is fantastic and super cool work. I was wondering if you could speak a little bit about in your view what some of the tradeoffs are between person time and computer time.
And you know, maybe that’s setting specific, but like how much effort should we invest in, say, a rule based approach versus because that requires a lot of manual person time versus a deep learning or machine learning approach that requires much less but may require a lot of different fine tuning.
But anyway, just yeah.
Interested in your thoughts on that?
David Cronkite:
Do you do that one Will?
Will I Bowers:
Uh, yeah, I think I have a good approach.
I think the way I think about using like a rule based or machine learning approach oftentimes is thinking about what are the trade offs we’re looking for.
And so a great example of this is interpretability.
For example, if we need to say this is exactly why we have this particular output, I think that’s a great use case for a rule based solution.
Selfishly, I’m always trying to push towards machine learning solutions and just apply the cutting edge, but I think frequently the high computing power and interpretability are big trade offs on that.
I, David, do you have more to say on this?
David Cronkite:
Yeah, I think if it’s not because I think there are some obvious cases where you can bump them over and one of them is, yeah, the umm, if you have some like downstream purpose or goal and how you wanna use it, regular expression based or rule based ones can also be are much more portable in general because it’s not relying on your feature.
Somebody each person within the machine learning space will have to retrain it on.
We’ll likely have to retrain it, or you can’t even move the feature set out because it’s may have PHI baked into it.
And there’s other cases where you know if you already have labeled data somehow.
Umm, I think the the trade off.
The trade off might have to do more with how much if you think you can get enough data.
Labeled in time to be able to have a machine learning based approach.
Yeah, but I don’t.
I don’t think there’s really any any easier or solid answer on it.
Am I in might come down to just what the trade offs are?
Uh, the one advantage to like a pattern based solution though is that you can stop at any point with machine learning one like there’s not.
It can be harder if you don’t get enough data to begin with to.
Because the call it good enough.
Thanks.
Chloe A Krakauer
Uh, I don’t think over the chat.
And a few things that popped up and I notice and Roy, your comment was a meant for the group at large or for David, well in particular.
Roy Pardee:
I guess I was, uh, offering temptation to David and Will to indulge me.
But like I say, this might not be the right venue for it.
So, you know, feel free.
We can follow up offline.
Will Bowers:
I can say quickly really that this specific problem is that I’m trying to deploy a containerized solution on a virtual machine and I was not able to enable the virtualization that was adequate to actually deploying that.
If I had a physical desktop, I would have been able to do that, but because I was trying to virtualize in a virtualized machine, I was not able to do that.
Working with our IT.
Roy Pardee:
Well, thank you.

David Cronkite:
And I’ll also on the containerization, I can take some bait, it would enable the sharing significantly more and we’ve also been passed in the past and not just this project by other projects where we’ll get containerized solutions and then basically have to unpack it and deploy all the elements separately which makes it a bit more challenging, especially if you wanted to iterate or update or fix things.
Will I Bowers:
And that’s exactly what happened in my case, and the tool was actively being worked on.
So whenever they made changes, it wasn’t as simple as just pulling down the new container, it was updating all the constituent parts and wiring and all the work that went into that as well.
David Cronkite:
And I see Al has a question on yes, the containers will Docker in it’s evil spawn evil twins.
How do you overcome a data volume requirements and machine learning?
Or can you are there ways to overcome that and that is actually one of the things we’re trying to.
I mean, you can definitely.
You can definitely do things with a lot of.
There’s a lot of published literature on trying to speed up or improve both the accuracy, because there’s two ways to approach this.
One is trying to improve the accuracy of the data that’s being abstracted, which is to kind of help Research Specialists along and try to speed up or allow them to be more efficient.
So they’re various ways to do that.
Either reducing what they’re somebody has to look at how difficult it is to do labels.
So there’s that side of things.
The other one is I presented on the social support case where we were using.
We’re experimenting this interested in using this for a deep learning solution, which is I guess machine learning where we’ve started with the regular the rule based system and are trying to use that as training data and within the deep learning context. Umm.
Yeah.
And I think there’s also catches like we’re, umm, we’re subject to whatever the needs are of grants.
Umm, which also means that if there’s more stuff we would like to do with data after we’re done with it, if we don’t have time.
Yeah, that doesn’t work.
So it’s great that for at least for that particular study there, there is a lot of interest in saying, hey, can we try something, can we pull off now we have a solution, we have a regular expression pattern based solution.
Can we?
Can you grab some bleeding edge thing and do that too?
Because that sounds fun.
And so when we have an investigator say something like that.
It’s it’s a exciting.
Chloe A Krakauer:
But look at the time I wanted to thank Will and David so much again for your awesome presentation today.
Thanks to everyone who was able to attend and hope everyone has good rest of the day, is able to get out and enjoy the sunshine a little bit.
David Cronkite:
Thank you.
Will I Bowers:
Thanks everybody.

Likes: 0

Viewed:

source

Leave a Reply

Your email address will not be published. Required fields are marked *