yarn @<movq https://www.uninformativ.de/twtxt.txt> "(#t6wt7ja) @prologic@twtxt.net Ahh, I see. Okay, I’m with you there. On this high level, I can understand how the thing works. Maybe my wordi ..."

www.uninformativ.de

@prologic@twtxt.net Ahh, I see. Okay, I’m with you there. On this high level, I can understand how the thing works.

Maybe my wording isn’t good. 🤔 Let’s take a real life example from what we do at work.

There’s this AI chatbot. It gets support requests from users, so the user says something like “I need access to a particular system”. This triggers the bot to “run” the instructions stored in a large Markdown file, like “check if the user is authorized to do this, then issue the following API requests”, and so on. This is essentially like running a little script, except it’s written in natural language (German) and there’s no “script interpreter” but just the AI.

Now, suppose that the AI doesn’t quite do what was intended. There’s some subtle bug. How do you debug this? How do you find out how the AI came to the “conclusion” to run step A instead of step B? And how do you find out how exactly you have to change your prompt so this doesn’t happen again next time?

If this was an actual script/program instead of AI, you could repeat the request and attach a debugger or throw in some printf() or whatever. How do you do that kind of thing with AI? How do you pinpoint exactly what the problem was?

(Or is this just a stupid idea? Do we have to give up that way of thinking when using AI? Is the era of debuggability over?)

⤋ Read More

prologic

twtxt.net

11:59AM (1h ago)

On the subject of debugging these so-called AI(s) / Black Boxes… the model is a black box sure, but that’s not really the problem. Everything around it — the inputs, the outputs, the decisions it makes — all of that can and should be fully logged, traced and replayed. The “program” isn’t the model, it’s the full context you feed it. That’s what you debug. It’s not so different from any other system really; if you’re running something in production with no logs, no structured outputs and no tests, you’d have the same problem. The model doesn’t change that discipline, it just makes it more important.

⤋ Read More

Participate