Science-Intensive Apps: Handle with Care

A visualization of two fluids mixing. Public domain. — A visualization of two fluids mixing.
Public domain.

A certain type of science-intensive application only has a few (or even no) control parameters. It is usually driven by a “big data” analysis. The algorithms inside may or may not be very complex.

There are many applications like this: automatic classifiers, visualizations, recommendation engines, credit analyzers, and many more.

I’ve come to believe these applications should be marked “Handle with care”. I’ll identify a number of risks in this approach, and a couple suggestions for improvements, but I don’t have “the answer”.

Multiple Perspectives

Creating science-intensive applications is research-driven. At the start, may not even be clear on what we’re looking for.

The User

In effect, the user wants technological magic: “Help me solve this hard problem, which nobody actually knows how to solve. I might recognize a solution if you showed it to me, but I might not.”

In some cases, the tool is controlled by the user, e.g., some new and interesting visualization. In other cases, it’s more of a back-end process, e.g., a system making recommendations (where the user affects it only indirectly).

Scientist

A scientist may have some idea of the problem they’re solving, but it may morph as the research proceeds. The groups I’ve seen assigned a solo scientist for the research phase.

One manager of a team creating multiple projects like this told me (approximate quote): “We hire PhDs to do a new analysis. They do the equivalent of another PhD to make it happen.”

A scientist will typically analyze a large data set, and devise a model parameterized by its characteristics. They may run many simulations, or apply the model to a variety of systems to test it.

They often create a prototype, typically in R, FORTRAN, or Python.

Programmers

Once the research has proven out, the programmers get involved.

Their goal is to “productize” the research. This typically involves translating the code to a more performant or compatible language. They often have to create a user interface. They usually need to turn the code into a plugin or service, and integrate it into an existing system.

At some point, they bring back the scientist to validate that it works as intended.

Then they do user acceptance testing, get customer feedback, and release it.

What are the Risks?

The key challenge is that it’s hard to validate a system like this. There’s often not a real spec, and these systems rely on big data. They may be non-deterministic. The validation often relies on human intuition.

There are opportunities for mistakes everywhere along the way.

Research

We’re working from an indistinct problem – formulating the problem and possible solutions is the goal of the research project. How do we know we’re tackling the right problem?

Questionable input data – How was the data chosen? Is it truly representative? Is that a good thing?

Processed through various tools – It’s common for the data to have traversed through multiple formats and tools. Has it ever been held in a spreadsheet? Anybody named April, May, or June can tell you stories of wrongly typed data coming out of a spreadsheet. See “How do you know your spreadsheet is right?” in References.

How is the analysis validated?

How robust is the model? Has there been sensitivity testing, to understand if slightly different inputs would give radically different outputs?

How do we know the model is suitable? Where do its parameters come from?

Scientists are often not professional programmers. Not to over-generalize, but many have limited training in software development. Their prototype or algorithm may be unduly complicated, have bugs, etc. They rarely have tests other than “Run it and see if it looks right.”

Implementation

Developers may run into suspicious or even obviously wrong code. When asked about it, the scientist may answer, “I don’t know about that – all I can tell you is that it works the way it is.”

Programmers may add “pindown” tests (characterization tests) to lock down code at the unit or system level, but consistent behavior doesn’t guarantee correctness.

The model and original code are deeply understood by at most one person, and perhaps at a lesser level by the development team. It’s not typically peer-reviewed, as it’s proprietary technology.

There’s a risk of semantic shift when a prototype is re-implemented in a different language. It’s very hard to guarantee identical semantics and behavior.

Bias

If the input is based on historical data, the data reflects existing biases. Previous decisions may be wrong or unfair, but our system will be driven to behave the same way. This has certainly been in the news lately with respect to facial recognition, medical equipment, automatic faucets, credit checking, and more.

(See “The Role Of Bias In Artificial Intelligence” in the References.)

Even if you can identify the bias, it’s hard to know how to address it.

If “I’ll know it when I see it” determines correctness, we’re relying on a human interpretation. This is also fraught. Humans are not good at noticing small changes or detecting biases, and human time is costly.

What to Do?

Again, I don’t have great answers, just a couple ideas that might help.

With regard to biases, you really need to understand the existing biases and whether you can overcome them, even theoretically. I’m not convinced it’s possible in general.

For example, some AI systems take in big data and derive parameters, but have a relatively shallow model. When they “work”, we don’t really understand why. A system designed to match human answers can find sneaky correlations that reflect biases, but in a hidden way. For example, we would notice a system that used “race” as an explicit factor, but might not notice “length of street name” as a proxy.

There is a thread in AI around developing “Explainable Artificial Intelligence” (see References). I hope that these approaches yield systems that are both effective and socially responsible. If your system affects people in risky ways, you should be very careful before deploying a system that can’t explain its decisions.

On the project and programming side, I usually see the scientist as the scarce resource. In the spirit of relieving the bottleneck, I suspect dedicating a programmer to support the scientist would yield faster results and a more maintainable system when it’s time to productize. There’s an art to prototyping quickly, but there’s also a need for testing and restructuring.

Conclusion

For science-based applications, one approach is to have a scientist do the research and a programming team come through and “productize” the result. I’ve identified a number of risks in this approach, and a couple suggestions for mitigating them.

References

“Explainable Artificial Intelligence”, Wikipedia. Retrieved 2021-05-31.

“How do you know your spreadsheet is right?”, by Philip L. Bewig. Retrieved 2021-05-31.

“The Role of Bias in Artificial Intelligence”, Forbes, 2021-02-04, by Steve Nouri. Retrieved 2021-05-31.