4 Comments
User's avatar
Alex Strick van Linschoten's avatar

I really would have loved you to include how much the API costs were for this test case (and all of the other ones you do). Also how much time did it take to run. Using these agent-based coding tools isn't for free either in time or token cost, so that's def a factor I often consider.

Expand full comment
Eleanor Berger's avatar

I used my ChatGPT Plus subscription and didn't run into limits during this test (and in other tests managed to go for quite a bit longer). And it's fast, this test took about 20 minutes, with I guess about 1/3 of time waiting for me to act. There are links to the "arrangements" offered by OpenAI for the subs. As I mentioned clearly, I am not that happy with that and also prefer clear consumption metering. But at the very least it's worth mentioning that at current state they seem quite generous.

Expand full comment
Alex Strick van Linschoten's avatar

Yeah I'm less worried about limits and just interested in the raw numbers. You can learn a lot by seeing exactly how it works, but it's very hard to get an intuition for how confident it was doing that task etc. You can have proxies by comparing how efficiently it searched / read to grok how something works, or perhaps just comparing how it went about an implementation vs how you'd have implemented it.

Anyway, 20 mins seems reasonable for this kind of task!

Expand full comment
Eleanor Berger's avatar

Note also that in the case of GPT-5 the reasoning budget matters a lot too. In this test I used medium reasoning, but for more complex tasks high reasoning is necessary and that's more token and longer waiting time.

Expand full comment