Subtleties of AI Productivity Gains¶

Warning

Trigger Warnings:

Technology Trigger: For those of you that think LLMs generally suck, whilst I get it, I'm afraid I'm not in that camp; I'm one of those AI evangelists that likes LLMs, doesn't mind vibe-coding (within certain contexts, e.g. it's throwaway or low impact), and generally enjoy seeing the rapid pace of development in the AI domain!
Economics Trigger: At the risk of coming across rather thick, I'm going to be discussing economics, which is not my background.

In 2024, a panel of speakers at the World Economic Forum discussed Generative AI impact to labour markets with the following panel themes (of which I'll abbreviate): From skimming the speakers, I think it would have been prudent to include someone sitting in a technical role, to provide a more rounded view on the merits of applying novel technology.

Generative AI has catapulted AI technology to [one of the] most impactful innovations of the Fourth Industrial Revolution. [Some] predict its effect on global value chains is analogous to the steam engines of the Industrial Revolution.

With [speed and accessibility], what is the [impact] for industry worldwide and how do leaders manage its risks?

Stepping clear of the AI risk aspect and focussing instead on the impact, it's clear that the consensus is that a group of economists and CEOs predict outsized returns from Generative AI. But did that turn out to be the case in 2025?

Terminology 101¶

Info

Feel free to skip this section if you are familiar with productivity, and ideas like LLMs.

If you are like me (a programmer and not an economist), then here is a quick refresher on what some of the terms that are thrown around in this post mean:

Productivity: Productivity is a measure of performance that compares the output of a product with the input, or resources, required to produce the output. The input may be labour, equipment, or money. For a company, productivity can be a measure of the efficiency of a company's production process, calculated by measuring the number of units of a product produced relative to labour hours or by measuring sales relative to labour hours.
Labour Productivity: Labour productivity is about measuring how efficiently labour (as an input) can be used to create outputs. Labour productivity can be analysed by profession to profession, to identify trends in job growth, wages, and technological advances.
Labour Market Outcomes: Labour market outcomes refer to the overall performance and characteristics of the labour market, including factors like salary, employment levels, etc.

If you are not like me and are more an economist than a programmer, then here is a quick refresher on what some of the technical jargon thrown around in this post mean:

Generative AI: Generative AI is a type of learnable program, that synthesises new content like text, images, music, and video. It learns from existing data to generate novel outputs that reflect the training data's characteristics but are not merely repetitions. In corp-speak, in my experience often Generative AI is falsely attributed to having a 1:1 mapping with LLMs.
LLM: Large Language Models are a type of Generative AI that are trained on ginormous corpus' of text data. Typically autoregressive, transformer architectures, but in principle can be all sorts of models. Models you may have noticed that are delivered from OpenAI, Anthropic, Google are often LLMs, e.g. ChatGPT, GPT4.0, Claude, Gemini are all examples of LLMs.
Vibe Coding: LLMs that write code for you, with little input from you the user. You specify what you want in natural english (or Chinese, or whatever), and the LLM will generate the code for you. The whole point is that it's low effort, and should be used for low risk things - not important stuff, e.g. authentication of customer facing technology. This is a nice introductory video on the topic.

Hype vs. Reality (2023-2025)¶

Soaring Expectations¶

Early analyses in 2023 projected enormous gains from generative AI - for example, Goldman Sachs forecast a $7 trillion boost to global GDP and a 1.5 percentage-point rise in productivity growth over a decade. Tech investment surged (private Generative AI funding hit $33.9B in 2024, +18.7% YoY) amid widespread adoption. By early 2024, 72% of companies reported using AI in at least one function (up from 55% in 2023), see exhibit 2. Generative AI became the buzzword, fuelling hopes of rapid efficiency gains in fields from software development to pharma R&D.

Indeed, my own experience whilst at AstraZeneca during the peak ChatGPT hype was, from the CEO, to VPs, to middle-managers, and to me, a front-line-manager, were racing around trying encourage wide-spread adoption. It was just the game at the time. I saw that:

Most middle-level (and above) management were quickly trying to understand what a LLM was, and what they could be utilised for within the business context. Actually many front-line managers including myself were still technical, and had been working with transformers and auto-regressive NNs for years!
Most of management were quickly (re)allocating budget to shoe-horn LLMs under the guise of "Generative AI" into every possible project.
Company-wide, AstraZeneca was creating vertical and organisational polls, brainstorming and other ideation mechanisms Organising AstraZeneca-wide initiatives incidentally is no mean feat; the org chart was around ~90k when I was there! to find new ways to roll out LLMs at AstraZeneca.

Emerging Reality Check¶

As 2024 unfolded, evidence of a productivity boom was scant. Many firms struggled to realise ROI: typical AI initiatives yielded <10% cost savings and <5% revenue uplifts. This disconnect between massive capability advances and meager measurable returns reflects a "productivity paradox," as one economist noted. Even Goldman Sachs tempered its optimism by mid-2024, warning of "too much spend, too little benefit" as on-the-ground results failed to meet the hype. In short, AI was everywhere except in the productivity stats: Echoing Solow’s classic quip from the 1980s (more on this later).

A Key Question¶

So, fast forwarding to May 2025, now the dust has settled:

Has the value-add (in terms of productivity gains) from Generative AI finally been realised?

Large Language Models, Small Labour Market Effects¶

In May 2025, a study "Large Language Models, Small Labor Market Effects" from economists Anders Humlum (University of Chicago) and Emilie Vestergaard (University of Copenhagen), would argue that the value-add of Generative AI has not been realised. They estimate users shaved 2.8 % of weekly hours (≈68 minutes) and recorded zero earnings effect (see Figure 7: Log Earnings Around the Launch of AI Chatbots).

In the manuscript, the authors argues that despite the narrative of Generative AI transforming labour markets, the reality may be distinctly more benign.

Key Contributions¶

Skipping straight to the conclusion, the manuscript states (emphasis my own) that AI has minimal impact on productivity:

Despite rapid adoption and substantial investments by both workers and firms, our key finding is that AI chatbots have had minimal impact on productivity and labour market outcomes to date.

Then the authors add a key challenge against AI and transformational change:

Any account of transformational change must contend with a simple fact: Two years after the fastest technology adoption ever, labour market outcomes - whether at the individual or firm level - remain untouched.

Pretty damning words for technology evangelists!

Background Information¶

Study Details¶

The authors concluded the above after the examination of 25,000 workers across occupations deemed particularly vulnerable to automation, because of their exposure to AI chatbots. The occupations were:

Accountants.
Customer support specialists.
Financial advisors.
HR professionals.
IT support specialists.
Journalists.
Legal professionals.
Marketing professionals.
Office clerks.
Software developers.
Teachers.

Workers in their respective professions were asked how their companies investment in AI:

Affected worker adoption of AI.
How AI tooling changed processes in the workplace.

The "Large Language Models, Small Labour Market Effects" study uncovered a subtle rebound effect, when they revealed that AI tooling created new job tasks for 8.4% of workers, including some who did not use the tools themselves, offsetting potential time savings. For example, teachers now spend time detecting whether students use ChatGPT for homework, while other workers review AI output quality or attempt to craft effective prompts. Indeed, what I've seen in my own experience is via "Vibe Coding", I can quickly generate a lot of code that gets me 80% of the way there, but I can spend hours debugging the vibed code, and debugging someone else's code is arguably the worst aspect of programming. More on this later though. Such new tasks cancel out potential time savings.

Even where there was time saved, the study estimated only 3-7% of those productivity gains translated into higher earnings for workers. I find this somewhat contentious; I've been fortunate enough to get promotions and have new roles offered to me, and if I'm honest, I'll use all sorts of technologies and tools to improve my output, which absolutely includes LLMs - which can be excellent as a personal critic, and make my work more rounded by highlighting and rounding off personal bias.

Conflicting Evidence¶

The above outcomes contradicts a randomised controlled trial from Erik Brynjolfsson, Danielle Li and Lindsey Raymond published in February 2025, that found generative AI increased worker productivity by 15 percent on average.

The experiment randomised 5,172 customer-support agents. They found that access to AI assistance increased worker productivity, as measured by issues resolved per hour, by 15% on average, with substantial heterogeneity across workers. The effects vary significantly across different agents. Less experienced and lower-skilled workers improved both the speed and quality of their output, while the most experienced and highest-skilled workers see small gains in speed and small declines in quality.

In reference to the 15% improvements, Anders Humlum stated to The Register that:

While there are gains and time savings to be had, there's definitely a question of who [benefits. Most] tasks do not fall into that category where ChatGPT can just automate everything.

My read on this here is that Humlum and Vestergaard surveyed occupations vulnerable to automation broadly, perhaps missing critical sector-specific nuances. Brynjolfsson, Li, and Raymond, conversely, conducted a controlled experiment specifically within call centres, an environment ideally suited to AI augmentation, thus demonstrating clearer productivity gains.

Comparing the two methods underscores how varied outcomes can be in terms of productivity improvement, depending on task suitability to AI.

Explanation Via Paradoxes¶

As an optimisitic-techno-evangelist, it makes me sad (and therefore I find it hard to accept) that AI isn't returning outsized gains in terms of productivity by automating issues away at the business level Tongue-in-cheek, my objective in life is to automate my own job away. . Anders Humlum is quoted saying:

My general conclusion is that any story that you want to tell about these tools being very transformative, needs to contend with the fact that at least two years after [the introduction of AI chatbots], they've not made a difference for economic outcomes.

I am admittedly coming from a position of privilege, and this post is somewhat an emotional response to rationalise and explain the findings, but the key finding is that AI is not transformative has not been my personal experience. I've found these sorts of tools very useful. So reading around this topic, it seemed that other explanations may be available; a cursory search will turn up a few paradoxical situations that fit the fact that AI automation has seemingly minimal impact on productivity. The question I'm now asking myself is:

Can Humlum and Vestergaard's findings be explained by, or be aligned to, well studied paradoxes?

The Productivity Paradox¶

It is said that sometimes ideas are so stupid, only academics could believe in them...

Only joking! So what does the "productivity paradox" mean?

The "productivity paradox" refers to the disconnect between powerful computer technology and conversely, weak productivity growth. In the 1970s-1980s, there was a slowdown of productivity growth despite rapid development in the field of IT over the same period. The "productivity paradox" disappeared in the 1990s, but issues raised by research efforts became important again, when productivity growth slowed around the world again from the 2000s to the present day.

The term "productivity paradox" was coined by Erik Brynjolfsson Yes, the same Erik who also wrote the paper above on 15% productivity improvements in February 2025 in a 1993 paper "The Productivity Paradox of IT" inspired by a quip by Nobel Laureate Robert Solow "You can see the computer age everywhere but in the productivity statistics."

We are quite likely in the "installation" phase, see the plots here and imagine instead, a LLM diffusion curve, where visible investment outstrips realised returns until complementary assets - skills, processes, data architecture, model expressivity - catch up. I also think there is likely overlap with some of the other ideas behind the productivity paradox and LLMs as a technology:

Element of the  Productivity Paradox	Canonical Explanation (1980‑90s IT literature)	Direct Parallel in Humlum & Vestergaard (2025)
Rapid, highly visible adoption of an eye‑catching technology versus flat productivity statistics	Solow's quip: "You can see the computer age everywhere but in the productivity statistics." Firms bought PCs, networks etc, yet measured output per worker barely budged for a decade.	Denmark exhibits the fastest recorded uptake of a new digital tool (64 - 90 % of exposed workers using chatbots within two years), but earnings, hours and value added per worker remain indistinguishable from the pre‑AI baseline (confidence intervals exclude even 1 % gains).
Time lags created by organisational and skills adjustment costs	Early IT required re‑engineering workflows, building complementary databases, rewriting job descriptions and retraining staff-tasks that transiently absorbed the very labour the computers were meant to liberate.	8.4 % of workers-teachers hunting AI‑written essays, clerks quality‑checking hallucinations, "prompt engineers" tinkering with system messages-now perform entirely new chores that soak up the hour‑per‑week time saving the chatbot was meant to deliver.
Mismeasurement of output and quality A note on measurement: Spill‑overs such as higher customer satisfaction or faster experimentation often register as consumer surplus, which GDP and most payroll datasets ignore. Some “missing productivity” may simply be mis‑categorised value.	Service quality, design speed and informational richness, often improved by IT, rarely showed up in 20^th-century GDP. Ergo, the paradox could be partly statistical.	Humlum and Vestergaard note that chatbots do shave roughly 2.8 % of task time, but national accounts still record little change in value added or wages, suggesting either mis‑categorised benefits (e.g., higher customer satisfaction) or benefits captured in consumer surplus rather than payroll.
Benefit capture drifting away from frontline labour	Brynjolfsson (1993) showed returns to IT capital flowed disproportionately to skilled managers and shareholders; median wages stagnated.	Only 3-7 % of measured efficiency gains reach workers’ pay packets; the remainder appears to accrue to the firm (lower labour share), echoing the earlier pattern of capital‑biased technological change.
"Computers do not substitute for people; they complement them."	The 1990s consensus: IT raised the ceiling on what could be done, spawning entirely new tasks (web design, spreadsheet modelling) rather than eliminating labour outright.	LLMs arguably enable individuals to do more and to raise their sights; LLMs catalyse complementary high‑cognition work-curation, oversight, policy, and enable workers to generalise more in terms of what they can do, rather than eliminating labour outright.

As you can see in the above table, there are a lot of concurrent themes and situations between "Large Language Models, Small Labour Market Effects" and the "Productivity Paradox", which to me, is possibly suggesting a link.

Jevons Paradox¶

The second lens that sprung up in research that helps to frame the results from "Large Language Models, Small Labour Market Effects" is the Jevons Paradox. This is when a technology makes a resource any of { time, energy, money } cheaper to use, which means the total demand for that resource often rises, rather than falls. 

We'll see that efficiencies introduced however do not lead to increased productivity; efficiencies via Generative AI (LLMs) may increase productivity, but those productivity gains will be absorbed by increased expectations, or newer tougher work.

Example 1: Legal¶

Let's look at a legal example: Prior to digital documents, a fairly common junior lawyer task might be:

There is an upcoming court case and we need relevant case law.
Go to the physical archive, and find past cases relevant to current case.
Check for conditions $X$, $Y$, and $Z$.

This task might have been assigned to a junior team of anywhere between 2 and 10 people. But now, 1 junior with a laptop be able to do the same job. Further, the automation of processes like contract reviewing, enforcement of negotiations (smart contracts) and client intake (expert systems) allows law firms to streamline their procedures and improve efficiency, at the cost of junior roles.

In a similar vein, the Economist wrote:

Legal surveys are cheaper and more accurate using data analysis technology than using paralegals or junior lawyers.

Now, as a result of this fact, the law firm can also manage more cases!

Let's imagine that AI tooling makes legal work 50% faster. Rather than reducing work, law firms can now accept twice as many cases, expecting higher productivity.

Lawyers, however, find themselves tackling increasingly complex or nuanced cases that were previously not economically viable. Imagine that partners at the law firm now race to accept two additional briefs per month. The volume of legal work (and ultimately aggregate junior hours) rebounds, but the nature of the tasks shifts toward higher‑complexity research, red‑flag verification and - crucially - draft "debugging" when the model hallucinates case‑law that never existed. So, rather than reducing work, efficiency increases overall demand, and productivity gains are offset with newer modes of "junior" work required, which is Jevons Paradox in action. Another way of saying this, is that the productivity per legal case rises and the total labour input to the firm does not fall. Note that Humlum and Vestergaard see no meaningful drop in hours or surge in wages. The Jevons Paradox tells us that when technology becomes more efficient, we tend to use it more, which can offset the gains in productivity.

Example 2: Call-Centre¶

Another example that came up when looking into the Jevons Paradox is that Mark Zuckerberg mused that a human‑staffed help‑desk for every Facebook user would cost ~$15 billion a year. He then posited that LLMs can deflect 90 % of "I've forgotten my password" calls (or insert queries here simple enough to be handled by AI). The remaining 10% of calls would need to be handled by humans.

An automated customer support practitioner on HackerNews noted that chatbot roll‑outs typically automate the first 40% of tickets handily; the next 40% are solvable but messy (clients tell life stories rather than order numbers); the final 20 % defeat both robot and human alike.

Whatever the correct numbers are, (my uneducated guess it's closer to the 40-40-20 rule from above,) automation pushes human agents up the difficulty gradient instead of out of employment; as time saved, is re‑spent on tougher edge‑cases, bespoke empathy, and chasing down the model’s occasional non sequitur. The net result here is that new efficiencies lead to tougher work, meaning sustained labour demand.

Tying the Paradoxes Together¶

In this section we've explored a couple of different paradoxes that might explain why productivity does not drastically improve on the introduction of novel LLM technology.

The Productivity Paradox explains why recorded output and wages have not surged despite visible AI uptake: early organisational friction, mis‑measurement and capture of gains by capital overshadow headline efficiencies.
Similarly, Jevons Paradox explains how the little efficiency that does appear can boomerang into more aggregate work - new cases, new calls, new verification chores - neutralising any reduction in total labour demand.

I suspect that the Productivity Paradox is a better fit to the issues found in Humlum and Vestergaard's work. But it's also likely the case that it's a little bit of $a$ and a little bit of $b$. Taken together, the two paradoxes make Humlum and Vestergaard's "minimal impact" to productivity look almost orthodox!

My Experience of Generative AI in the Workplace¶

Throughout my time both writing code, and developing, training and using generative AI technologies, especially since LLMs became "good" (post-InstructGPT), I've encountered varying (and increasing) degrees of utility, productivity and success. Regardless, I've often tried it because I thought it might make me more productive:

In terms of sliding scale of a personal productivity boost, I've been on the receiving end of wasting my time with LLMs, to seeing productivity boosts.

Modest To Good Productivity Gains, Sometimes¶

In general, my experience using Generative AI in my programming workflow, is that I align more towards the 15% productivity improvements seen in Brynjolfsson et al's work.

What really matters is the LLM being used, and how well suited the LLM is to the type of the data {latex, code, natural language, etc} I'm working with. What's been highlighted to me from researching this post, is that;

It's likely because a lot of my work is well suited to LLM use (code writing, documentation), and;
because I understand how to adapt other domains better to the text-based and in-context learning paradigms often required for success,

I've found it often relatively easy to make benefit of this technology, where others haven't seen the value in it.

In software development, generative AI tools such as Copilot, ChatGPT, and more recently in 2024, Cursor etc have been particularly helpful in certain contexts, especially tasks involving repetitive or boilerplate code. For example, minor adjustments to web page styles or generating initial code templates can often be completed rapidly and effectively using these AI tools, with low risk. For instance, when working on this blog, and iterating on the interface, Cursor and friends can quickly suggest accurate and stylistically consistent CSS or HTML code, reducing time spent manually writing routine code. Such small-scale efficiency gains are tangible and frequent, leading to smoother workflows.

However, my productivity uplift rarely translates into substantial value creation because the complexity and critical importance of tasks generally remain the same. Instead of drastically shortening project timelines, I now just spend more time reading and researching into blog posts, Jevons Paradox in action!

Be Lazy, Think Bigger, and Debug More¶

There have been times where I have had zero motivation to write code, because I am tired, or otherwise, and I've found LLMs are a great way to just get something rolling, that otherwise I'd not have done.

Additionally, my frontend skills are somewhat lacking (please don't ask me to centre a <div> in an interview), so it's been a boon to iterate on this static blogs styling with Claude, Gemini and GPT, which is a low risk and therefore in my opinion a perfectly acceptable target for such efforts.

However I think it is fair to say that whilst getting "80% there" with vibe coding can be initially appealing, it usually leads to new debugging tasks, which are arguably the least fun aspect of programming. I can see that armed with LLMs, developers may initially appear more productive, but the subsequent debugging and fixing of subtle issues often can outweigh any gains. One empirical study noted that many devs "avoid LLM-based code generation due to the time spent aligning with and debugging outputs, which negates any efficiency gains." Additionally, whilst I do think that AI models can help us mere mortal programmers do more, solo, when vibing serious code that will need to be maintained into the future, there can be serious ramifications.

Emerging Dangerous Productivity Sinks¶

Alongside new debugging chores and productivity sinks, generative AI introduces unique risks to professional environments, which risk even more unexpected time expenditure. A prominent example of this is the increased risk of misinformation or "hallucinations" produced by generative models, which can bubble up into all sorts of interesting issues, especially the the underlying LLM making the hallucination is part of a wider feature, or application that is user facing.

Airline chatbots give bad customer advice, leaving the company liable.
Food banks are listed as tourist hotspots.
Students being failed for being falsely accused of using LLMs, by a teacher that doesn't understand how an LLM works. It's not to say that students aren't using LLMs, and it's not to say that this is potentially a big thorny issue worthy of unpacking another time, but LLM's are not a good judge of AI content.
Real news summarised as fake news.

And frankly, the list could go on and on. Ultimately engineers will either need to debug these, do additional work to better design prompts or try other models, or ultimately they will need to police and monitor the output, effectively leading to productivity loss through a productivity "enhancing" technology.

Additionally, organisations face risks related to data privacy and confidentiality. At AstraZeneca, at the peak of LLM hype, when IT was racing to establish at a contractual level that the APIs and underlying models were ephemeral. As a practitioner and technologist, it was fairly clear to me that 1) the LLMs were a pure function and 2) accounts could be configured to not retain data. This was more an exercise in risk reduction and backside saving. Once issue from a productivity angle is that now you have all of these governance folks running around, trying to establish what the risks are. Furthermore, there is tangible danger introduced in that employees input sensitive data into AI models, potentially compromising company information or unintentionally breaching data governance policies, which is critical to avoid in highly regulated industries, e.g. with patient information from clinical trials, like at AstraZeneca.

Another risk to highlight where software developers increasingly rely on a LLM for writing initial code drafts. While LLMs can quickly generate functional boilerplate, the risk is there is obscured deep-seated issues - e.g. security vulnerabilities - that surface only after extensive deployment (or bragging, see the now meme-level example below).

Above is a clear motivation for why it's important to understand the technology you are using, and if you are vibing your way into existance, a) definitely don't brag about it and b) sort out a re-write or review ASAP as soon as it has gone beyond a throwaway idea validator and is customer facing! And incase you are curious, this is the site, which is still available at the time of writing! One thing to note however, is that I've watched in amazement as non-technical folks I know, pick up these LLMs and build programs, websites etc to solve their problems or build out throw-away demonstrators for ideas they have. With a small bit of encouragement, explaining what the tabs are in Cursor, and a prod to get a feeling for the structure of some of the code files (and how the structure of the files might relate to the output of the program), it's blown me away how accessible programming has become, and how much easier it likely continue to get. On the flip side, if we assume this code is important and will need maintaining, then I suppose professional programmers will be in with a job for the next decade at least fixing, organising and maintaining spaghetti codebases created with LLMs by non-technical folks.

These dangers effectively mean that more labour is now required to sort out all of the vibed-up issues in the codebase, which in some sense is the "Productivity Paradox" in action here.

Fractal Boundaries of Labour¶

Another nice paradigm I came across in my research for understanding generative AI's impact on work, is the "fractalisation" of labour tasks. Rather than simply automating existing roles, AI technologies often lead to the subdivision and specialisation of tasks or labour. Because generative AI handles certain sub-tasks well, and others not so-well, human workers end up focusing on the gaps and oversight around those tasks.

One analysis of digital labour notes, modern automation (LLMs) tends to decompose and standardise work into many small units. With AI, some of those units are done by the machine, but the human may have to orchestrate and verify each unit, creating a management task at a micro level. This is the fractal effect; each piece of work generates new sub-tasks to ensure the whole is correct. Additionally, productivity gains might stall as work "fractalises". Additionally, as AI handles the mundane bits, humans focus on elevated work; higher-level synthesis and judgment. For instance, Zuck's AI-driven call centre might automate routine queries, but the complexity of remaining interactions increases, creating highly specialised human roles to manage the hardest cases.

This fractalisation not only highlights the complement of AI and human labour, but also indicates that AI-driven productivity gains do not straightforwardly translate to labour market simplifications or reductions. Instead, these technologies reshape roles, creating increasingly specialised positions requiring different, often more complex skillsets.

I'm fortunate to have a fairly broad technical role at Literal Labs today. Or, each week I can slow down and annoy a new team! Sometimes I'm thinking about cloud and ops, sometimes full-stack development, sometimes CUDA kernels, sometimes C++ for embedded systems, and so on. I would say that some of the simpler boiler plate does get automated away with LLM tools. But typically then I'll spend more time on the specialised, fractal-ised tickets that need to be handled; LLMs tend to get painful details wrong with things like CUDA kernels, leading to extensive debugging.

Work... Work Never Changes...¶

The start of this post touched on peak-LLM-hype, and my experience of working in a large multi-national corporation, with instruction to race around trying to realise the potential of "Generative AI", which in turn effectively meant; apply LLMs as fast as possible to everything.

If we review Humlum and Vestergaard's "Large Language Models, Small Labour Market Effects", and if we reduce it to a single sentence, we read the argument that LLMs "had minimal impact on productivity".

So, focusing our attention back to the start of this post, does this study stand up to scrutiny? What are the explanations here? Has the value-add of enhanced productivity from Generative AI been realised for corporations (or indeed myself, the worker) who have invested so heavily in LLMs?

At the corporate R&D output level, I suspect it's a mixed bag. I saw some legitimately fantastic projects along the way, which were all about connecting data structures wrapped around medical knowledge and feeding that to LLMs. I'm not sure these could have existed in the same way before Natural Language Interfaces became "good". I also saw some not so great projects too, hence, I'd frame this as a mixed bag.
From a worker productivity lens; again, I suspect it's a mixed bag. Whether Cursor, OpenAI, Claude, Gemini are officially allowed (they weren't when I was working there), I have no doubt that in practice, many employees are using it. It is likely that general access to these tools will lead to some pretty gnarly re-writes down the line. It is also the case that these tools will speed up more experienced developers to write boiler place, and broaden the scope of what the individual will achieve.

Humlum and Vestergaard's study underscores an important reality: the impact of generative AI on productivity and labour markets is nuanced and multifaceted. While generative AI unquestionably demonstrates potential for efficiency gains, these benefits often manifest indirectly, accompanied by new tasks, unexpected risks, and more specialised human roles.

I think we can align real-world experiences with established economic paradoxes such as the "Productivity Paradox" and "Jevons Paradox", to gain clearer insight into why immediate measurable benefits are often modest. Understanding these paradoxes helps set realistic expectations about LLM's potential, and whilst it should not prevent us from adopting new technologies, these paradoxical paradigms help to educate us understanding how to better integrate, and measure long-term benefit, when introducing new technology into workplaces. Furthermore, a knowledge of the "Productivity Paradox" history of IT, tells us that it is likely that the picture may improve as the technology matures.

Ultimately, while the LLM revolution has undoubtedly arrived, this paper, and subsequent research, hint that LLMs may have brought less of a productivity boom, and more a productivity shuffle, and later, perhaps, there is still hope for a golden age. But I can't hang up my keyboard just yet.