Quantum computing in machine learning (part 1)

29. april 2025

Oppdatert: 24. mars 2026

I’ve been following quantum computing with a stifled yawn over years. Yes, I understand it has revolutionary potential, but we are far away from anything actual and useful. Move on.

I’m an artificial intelligence guy. Recently a headline caught my attention; “Chinese AI meets quantum power and gets smarter, faster”.

from the quoted article, © scmp.com 2025

Hm. Has the goal post sneaked appreciably closer?

Lora is a learning strategy that comes about because it is often too computationally expensive to fine tune a large model. You typically want to tailor a large language model like Falcon or Llama for a specific problem. So instead of updating the full set of weights in the model — which would be very expensive — you approximate the weight updates themselves using smaller matrices. This reduces computational cost while often retaining most of the quality.

Key here is that we’re working on an approximation. Another approximation might suffice as well. One other approximation comes through a trick. Some variants of Lora approximate the model weights using element-wise multiplication. It turns out that if you choose that particular approximation, nature has a built-in computer that can do something similar. That built-in computer is how quants, as in quantum physics, behave. That is the trick.

So, the promise here is that the heavy lifting in fine-tuning a large language model (say) you can leave to nature. It does it really fast and cheap. In fact it does it trillions of times everywhere you care to look.

Part of the problem is that until now quantum computers have just been a promise. Some time in the future. It is still in the future, but we’re starting to see actual working implementations of this and similar ideas. They still do not pay off in a practical architecture for most problems, but with actual realizations we’re a quantum (duh!) leap closer to turning computing upside down.

Let’s dig a bit into how this works. The paper and authors behind the headline (Kong X et al, “Quantum-Enhanced LLM Efficient Fine Tuning”, https://arxiv.org/pdf/2503.12790) use a physical technology that is fairly close to what you might rent from IBM and to some extent Google. If you were to build one, the chip itself (a so-called Josephson circuit) contains a set of qubits — the quantum equivalent of classical bits. The chip measures millimeters, but you need to freeze it, getting it working at a couple of thousandths above absolute zero temperature (-273 degrees celsius). And you need a fairly large microwave pulse generator. And Helium gas. And cables. And control equipment. In short, you’d need a small building. And you’d get a substantial electricity bill.

The IBM Eagle (left, © nytimes.com) and the Google Willow at Google Next 2025 (right, © google.com)

What about the time it would take? The trick I mentioned is such that you prepare the numbers you need for fine tuning in a very peculiar way, punch those numbers into that large microwave pulse generator, trigger the generator, let nature i.e. physics, do the calculation and read the result. The trick is such that to read the result of a single multiplication (a vector product) you need to read it many times. Perhaps a thousand or thereabouts. Each round takes some tens of microseconds. An eternity in modern computing. So with a thousand rounds we’re spending some tens of milliseconds. Fine-tuning a large language model requires updating hundreds of millions of parameters or thereabouts, and you would need to do it in many batches. The paper isn’t precise on their exact setup here, but you’re looking at hours to weeks for a single batch probably. This is orders of magnitude slower than if you were to do the training by CPU. Which nobody does, you use GPUs that are much faster. But just for comparison.

So that was physical realization, cost and time.

What about quality? In Lora you make a smaller set of numbers (a weight matrix) behave like a bigger set of numbers. Often the loss of quality is negligible, and the reduced computational footprint is substantial. But that’s not always. If the problem domain shows great variety or there is subtlety in the data the approach might fail. For example, if you with a language model try to be good at legal texts, poetry and slang at the same time. Or if you translate between very different languages. The trick I mentioned, has the advantage of being able to capture many independent features. Or non-linear behavior between them. Which Lora struggles with. The trick where we use nature’s quantum engine to approximate vector multiplication is not encumbered in this way.

So there are quality gains. But the practical outlook is challenging. If it is big, expensive and ridiculously slow, why am I writing about it?

Because now it is possible. You could rent the hardware and do this today. Some years back it was a pipe dream. It was not something you could do. Fast forward another 5 years?

That is why it caught my attention. It is all coming together. We’re not where everybody uses it yet, but we’re there where it is possible to do for a normal corporation. That is a crucial inflection point I’m thinking.

If you’ve read to here you can move on now. At least for the vast majority of you. I hope it was worth your time.

Read Quantum computing in machine learning part 2 here!

Quantum computing in machine learning (part 1) was originally published in Compendium on Medium, where people are continuing the conversation by highlighting and responding to this story.

Local-First AI: Protecting Users from Themselves

Snakk så folk forstår!

Hvordan ble Vite så mye raskere i Vite 8?

Quantum computing in machine learning (part 1)

Read Quantum computing in machine learning part 2 here!

Related posts

Local-First AI: Protecting Users from Themselves

Snakk så folk forstår!

Hvordan ble Vite så mye raskere i Vite 8?