So the procedure is that you take a set of real images of melanomas, you feed them to the artificial intelligence, it generates a set of artificial images of melanomas corresponding to the given parameters, and then you let the real doctors evaluate them, thus further improving the data and the system...?
Exactly! There is even more variability in the images of skin findings than in lung findings, and it's not just about size or shape. The single most important factor is skin tone, and I noticed that when Google Health released their DermAssist app last year, they didn't take much notice of it. While their system does work with a large number of samples of people with fair skin, the darker the skin tone, the lower the representation of those samples. I was surprised by this because to reduce the error rate of these systems, the data needs to be robust, balanced, and not favour one population, even if it has a higher incidence of melanoma.
If you have populated your system with roughly 7,000 real photographs in the initial run, and then generated more data based on that, and then generated more data based on that, isn't there some dilution in quality? I'm reminded of when Meta inaugurated its chatbot, which learned from conversations with users who asked it what it thought of the company's founder, Mark Zuckerberg, and it didn't paint a very flattering picture of him from the available sources and subsequent discussions of the answers. Zuckerberg was never able to come out of those conversations in a good light after that...
We trained our own classifier with those first seven thousand images. We then used the DALL-E model, which was public at the time, for about two weeks. So, we created some text commands, based on which we generated the first data, which we had evaluated by Dr. Březina (Eva Březina, M.D., Ph.D. from the First Dermatovenerology Clinic of the Faculty of Medicine and the St. Anne's University Hospital in Brno) and our own classifier to tell us whether the images contained typical features of melanoma. In the second phase, we are now working with so-called outpainting, where we are already trying to match specific findings to images of dark skin tones. Previous models based on so-called generative networks, which work with a known distribution of data, have in the past tended to do what you mentioned. That is, they did work with datasets that they tried to modify in some way to create, say, unique versions, but they were always locked into one universal distribution and couldn't work with anything outside. DALL-E, however, introduced a model trained on so-called web scraping, or on data downloaded publicly from the Internet. True, it's not very ethical, but it allows it to work with context and generate more accurate outputs.
However, your colleague Matěj Misař, in an interview for E15 magazine last year, was sceptical about public data...
It's worth mentioning here that while there are all sorts of self-diagnosis apps available on Google Store and elsewhere, we are creating a certified medical device that goes through a complex regulatory process, clinical evaluation and other necessary phases where we verify how we arrived at our data. Publicly available datasets published by various institutions at the same time as the description may not always be accurate. Companies like Google collect datasets of different lesions from different sites around the world and have a team of doctors who agree on those findings - by visual appearance, by touch, by histopathological analysis and so on. We tend to try to complement these datasets within these processes so that they are balanced and representative. Obviously, we cannot do histological analysis for the generated images, we cannot touch them, and we cannot examine their size, but we are able to create a dataset that will have certain melanoma-specific features. In doing so, we have to start from clinically validated data and then look for ways to 'help' them and work with them further.
It is said that AI is only as smart as its creators, but listening to you, models like DALL-E collecting contextual data from the internet go beyond such claims, don't they?
To a certain extent, yes. It's probably too early to say definitively, but I would say we're moving towards that. I've been following all the generative models and I notice that, for example, ChatGPT is already starting to be used by people instead of Google... Specifically within medicine, context is extremely important to us, and we've been approaching it from the beginning in a way that AI needs to be a tool, much like a stethoscope or a ruler is a tool. It just has to provide another point of view and at the same time the doctor has to get to grips with it, learn to respond to it. Only when this happens, and when it involves clinical trials that are robust, multicentre and that include a diverse representation of patients, only then will we be able to say that even within clinical practice, artificial intelligence, capable of operating in workplaces that have never seen data before, will perhaps begin to move beyond that.