The Section 41 R&D credit is one of the most powerful and most underutilized tax incentives available to AI automation agencies. For high-margin agencies with material development work, it can reduce federal tax liability by 6-20% of qualifying labor and infrastructure costs. The Qualified Small Business payroll-tax offset can convert non-refundable income tax credits into immediate cash savings - even in pre-profit years. But the credit is also a Tier 1 IRS audit priority, and the rules around what qualifies for AI work are nuanced. This guide walks through the four-part test in AI-agency language, the internal-use software trap, the funded-research exclusion that catches client-paid builds, and the documentation discipline you need to defend the claim.

In This Article

  1. The Four-Part Test in AI-Agency Language
  2. Internal-Use Software vs. Commercial Software (Critical Distinction)
  3. What Qualifies vs. What Doesn't (Activity Table)
  4. Qualified Research Expenditures (QREs) - What Counts
  5. Calculation Methods: RRC vs. ASC
  6. The Funded-Research Exclusion (Client-Paid Build Trap)
  7. Payroll Tax Offset for Qualified Small Businesses
  8. Form 6765 Section G: The New Disclosure Regime
  9. Documentation Framework
  10. NJ R&D Credit
  11. FAQ

The Four-Part Test in AI-Agency Language

Every claimed research activity must simultaneously satisfy all four parts of the IRS test under IRC Section 41(d). The tests apply equally to biotech, manufacturing, and AI/software work.

Part 1 - Elimination of Uncertainty (the Section 174 Test)

Was there genuine uncertainty at project inception about the capability, method, or appropriate design of the AI system? Examples that satisfy this prong for AI agencies: - Whether a RAG pipeline can hit a client-specified accuracy threshold on proprietary domain data - Whether a multi-agent orchestration architecture can reliably complete a multi-step workflow without hallucination or task failure - Whether a fine-tuned model can outperform a prompted base model at acceptable latency and cost - Whether two incompatible enterprise systems (legacy ERP + modern LLM API) can be bridged via a custom integration with no documented method

The standard is taxpayer-specific - 'new to the taxpayer,' not new to the world. A technique well-documented in academic literature can still generate qualifying uncertainty for a specific agency applying it in an untested domain or system context.

Part 2 - Technological in Nature

Activities must rely on principles of computer science, engineering, physical sciences, biology, or mathematics. AI development satisfies this almost automatically: applied computer science, mathematics (linear algebra, statistics, optimization), and data engineering. The IRS Software Audit Guidelines explicitly confirm that software development based on computational algorithms is technological in nature.

Part 3 - Permitted Purpose (Business Component)

The activity must aim to develop or improve the function, performance, reliability, or quality of a business component - any product, process, technique, formula, invention, or computer software used in a trade or business. Custom AI agents, domain-specific chatbots, and proprietary integration pipelines built for commercial deployment all qualify as software-based business components. Aesthetic improvements and market research do not.

Part 4 - Process of Experimentation (POE)

Substantially all (80%+ by Treasury Reg.) of the activities for a business component must constitute elements of a process of experimentation: identifying a technical uncertainty, forming hypotheses, designing and running systematic experiments, analyzing results, iterating. For AI work this maps directly to the model-development lifecycle - architecture selection, hyperparameter tuning, prompt strategy evaluation, benchmarked evaluation, iterative refinement.

The 80% rule cuts both ways. The IRS uses it to disallow QREs when non-qualifying production support is commingled with research activity. Track time meticulously at the project level.

Internal-Use Software vs. Commercial Software (Critical Distinction)

Whether your custom AI software is classified as internal-use software (IUS) or commercial software fundamentally changes the qualification analysis. This is the single most important threshold question for AI agencies and is frequently misunderstood.

Commercial Software (Standard Four-Part Test Only)

If the agency builds a chatbot or AI agent for sale, lease, or license to clients, it is commercial software and only needs to satisfy the standard four-part test. This is the preferred classification for most agency engagements.

Internal-Use Software (Three Additional Hurdles)

If the agency builds AI tools primarily for its own internal operations (an internal billing automation, an internal project management bot, an AI system to run the agency's own services), the IRS applies the High Threshold of Innovation test under Treas. Reg. Section 1.41-4(c)(6)(vii), which requires three additional showings beyond the four-part test:

  1. Innovative - the IUS achieves a substantial and economically significant cost reduction or speed improvement 2. Significant Economic Risk - the taxpayer committed substantial resources to development and there is substantial uncertainty about whether the technological objective is achievable 3. Not Commercially Available - the software is not commercially available to the taxpayer without significant modification

Practical structuring point: Where possible, agency-developed AI tools should be documented as products for client deployment (even if later adapted for internal use), rather than characterized from inception as internal-use software. The 2016 final regulations confirm that software developed for commercial sale and later repurposed internally is treated as two separate business components - only the post-repurposing internal-use modification is subject to the IUS standard.

What Qualifies vs. What Doesn't

ActivityQualifies?Why---------Custom LLM fine-tuning on proprietary domain dataYesResolves capability uncertainty; computer science principles; systematic experimentationRAG pipeline architecture and evaluationYesTechnical uncertainty about retrieval strategy, embedding choices, re-ranking; iterative POECustom multi-agent orchestration logicYesNovel multi-step workflow design; uncertainty about reliable task completionHyperparameter optimization / architecture searchYesClassic uncertainty resolution via experimentationData pipeline development for AI training/fine-tuningYesNovel processing feeding qualified research activitiesIntegration testing to resolve systemic uncertaintyYesWhere testing actually overcomes uncertainty (not QA of a known solution)Routine prompt writing for known use casesNoConfiguration, not experimentationBasic Zapier or Make.com setup using documented templatesNoOff-the-shelf configurationProduction support and bug fixes after launchNoPost-production excluded under Section 41(d)(4)Deployment / productionizationNoPost-experimentation; commercial production has begunUsing existing OpenAI/Anthropic APIs without modificationNoRoutine use of off-the-shelf toolsClient-requested configuration with no experimentationNoNo technical uncertaintyCustomer discovery / market researchNoExplicitly excluded under Section 41(d)(4)Copying/adapting another company's existing productNoExcluded as duplication under Section 41(d)(4)(C)

The Prompt Engineering Nuance

Routine prompt engineering - writing system prompts, adjusting tone, configuring a chatbot with off-the-shelf behavior - does not meet the four-part test. Systematic experimentation with prompting strategies as part of a broader research effort (few-shot technique evaluation, chain-of-thought architecture testing, benchmarked evaluation of competing prompting paradigms to resolve technically uncertain capability outcomes) may qualify if documented as part of a process of experimentation. The dividing line: was the activity driven by uncertainty about whether a technological objective was achievable, or was it configuration work?

Qualified Research Expenditures (QREs) - What Counts

QREs under IRC Section 41(b) equal in-house research expenses plus contract research expenses. The statute is exhaustive - if a cost type isn't listed, it isn't a QRE.

Wages (the Largest Category)

Section 41(b)(2)(A)(i) covers W-2 wages paid for qualified services: direct research and experimentation, technical supervision of research teams, and direct support of persons conducting qualified research (e.g., data engineers building pipelines used in experimentation). General administrative, HR, accounting, or non-technical project management functions do not qualify. Time allocation drives the calculation: a developer spending 60% of her time on qualified AI experimentation generates 60% × annual W-2 wages as a QRE.

Officer wages must now be separately disclosed on Form 6765 Section E. Founder/officer time allocations need to be well-documented and defensible.

Cloud Computing Costs

Cloud infrastructure qualifies under Section 41(b)(2)(A)(iii) and Treas. Reg. Section 1.41-2(b)(4), which permits QRE treatment for amounts paid 'for the right to use computers in the conduct of qualified research.' AWS, Azure, and GCP typically satisfy the three regulatory requirements: (1) servers owned and operated by the cloud provider, not the taxpayer; (2) servers located off the taxpayer's premises; (3) the taxpayer is not the primary user of any specific server.

Critical limitation: cloud costs must be allocated between development/testing environments (qualifies) and production environments (does not qualify). GPU compute used for model training during experimentation qualifies; the same infrastructure serving live production traffic after launch does not.

LLM API Costs (OpenAI, Anthropic, etc.)

LLM API calls present novel classification challenges with no controlling IRS guidance. Two paths exist: classify as cloud/computer-use under Reg. 1.41-2(b)(4) (token-based API calls are functionally equivalent to renting compute) or as supplies for prototype development under Section 41(b)(2)(A)(ii) (consumed in building and testing a prototype). CPA recommendation: classify LLM API costs as cloud QREs where possible, document the development-vs-production allocation, and note this as an area without controlling authority.

U.S. Contractor Expenses

Section 41(b)(3) allows 65% of amounts paid to third-party U.S.-based contractors for qualified research. Foreign contractor payments do NOT qualify as QREs under Section 41 - this has important structuring implications for agencies using offshore development teams.

What Does NOT Qualify

  • SaaS subscriptions (no-code AI platforms not used as compute) - Depreciable equipment (laptops, servers purchased) - General overhead, rent, facilities - Sales and marketing - Patent attorney fees (not a QRE, though deductible separately under Section 174A)

Calculation Methods: RRC vs. ASC

Two methods. Most AI agencies should use the Alternative Simplified Credit (ASC).

Regular Research Credit (RRC) - 20% of Excess Over Base

RRC = 20% × (Current QREs - Base Amount). Base Amount uses the Fixed-Base Percentage of historical QREs to gross receipts × Average Annual Gross Receipts over the prior 4 years, with a 16% FBP cap and a 50%-of-current-QRE minimum floor.

Most agencies lack 1984-1988 records, so the startup ramp schedule applies: 3% FBP years 1-2, 4% year 3, 5% year 4, 6% year 5+, before reverting to the historical calculation.

ASC = 14% × (Current QREs - 50% × average prior-3-year QREs). If you have no prior-year QRE history, ASC simplifies to a flat 6% of current QREs. The ASC is generally preferred for younger AI agencies because it avoids the startup FBP ramp and produces predictable calculations from only 3 years of records.

Example. 2023 QREs $150K, 2024 QREs $250K, 2025 QREs $200K. Average $200K. 2026 current QREs $400K. ASC = 14% × ($400K - $100K) = $42,000.

Taxpayers can switch between RRC and ASC annually to maximize the credit.

Section 280C Election (and OBBBA Interaction)

Under OBBBA-modified Section 280C, taxpayers who take Section 174A immediate deductions for domestic R&E must reduce those deductions by the Section 41 credit. The Section 280C(c) election trades a reduced credit rate (13% RRC / 9.1% ASC, which is 65% of the standard rates) for no expense reduction. Run both paths each year - reduced-credit election is often the cleaner result for higher-bracket agencies.

The Funded-Research Exclusion (Client-Paid Build Trap)

This is the largest structural risk specific to AI agencies and is frequently overlooked. Under Section 41(d)(4)(H), research is 'funded' - and excluded from the credit - unless the performing taxpayer can demonstrate both:

  1. Substantial Rights to IP - the agency retains meaningful ownership or rights to the resulting intellectual property, not merely the right to use the deliverable 2. Financial (Economic) Risk - payment to the agency is contingent upon successful completion of the research, not guaranteed regardless of outcome

If a client pays for all development work, owns all resulting IP, and pays time-and-materials regardless of outcome, the research is funded and the agency cannot claim the credit.

Contract Type vs. R&D Credit Exposure

Contract typeR&D credit exposure------T&M, client owns all IPFunded research - agency cannot claimFixed-price, client owns IPPartially funded - depends on risk allocation; complexFixed-price, agency retains IP, licenses to clientAgency bears risk + retains IP - agency can claimFixed-price, both parties share IPPartial exclusion - allocate QREsInternal development, no client paymentFully self-funded - agency can claim if POE satisfied

Practical takeaway: Agencies building reusable AI infrastructure (proprietary agent framework, fine-tuned base model, internal automation tooling) navigate this easily because the agency genuinely retains IP and bears development risk. Agencies doing pure client-specific build-to-spec under work-for-hire agreements may need to restructure agreements before the credit can be claimed.

Payroll Tax Offset for Qualified Small Businesses

For agencies that qualify as Qualified Small Businesses (QSBs), the payroll tax offset election under IRC Section 41(h) and Section 3111(f) converts non-refundable income tax credits into immediate cash savings via offset against payroll taxes - even in pre-profit or minimal-tax years.

Eligibility (Both Required)

  1. Gross receipts under $5 million in the credit year (entity-level, controlled-group aggregated under Section 52) 2. No gross receipts in any tax year preceding the 5-tax-year period ending with the credit year (i.e., no revenue in 2021 or earlier for a 2026 credit)

Limits and Mechanics

  • $500,000/year offset ceiling ($250K against employer 6.2% Social Security + $250K against employer 1.45% Medicare, the latter added by the Inflation Reduction Act for tax years after 12/31/2022) - $2.5 million maximum lifetime benefit (5 years) - Election made on timely-filed Form 6765 attached to the income tax return - cannot be made on amended return - Applied via quarterly Form 941 in the first quarter beginning after filing the income return - Unused credits carry forward

This is exactly the structure that fits early-stage AI agencies. A pre-profit agency with $300K of QREs in year one can convert that into ~$18K of immediate payroll-tax cash savings. Worth running every year through the 5-year eligibility window.

Form 6765 Section G: The New Disclosure Regime

The IRS overhauled Form 6765 starting with tax year 2024. The phased rollout matters for 2025 returns processed in 2026:

SectionContentEffective---------Section E - Other InformationOfficer wages in QREs; number of business components; new expense categoriesRequired: tax year 2024Section F - QRE SummaryAggregate QREs by typeRequired: tax year 2024Section G - Business Component InformationProject-level detail: component name, type, software classification, wage QREs by activity type, supplies/contract research/cloud per componentOptional 2024 (transition year); required 2025+

Section G shifts R&D credit defense from post-audit documentation to at-filing transparency. The IRS will receive component-by-component breakdowns of every claimed activity, wage allocation, and expense category before any examination begins. Inconsistencies between Section G and underlying records will be high-priority examination triggers.

Documentation Framework

George v. Commissioner, T.C. Memo. 2026-10, held that a credit study cannot manufacture qualified research where the contemporaneous record is thin. The standard for surviving a Tier 1 audit:

Tier 1: Project Inception (Created at Project Start)

  • Technical uncertainty memo (1-3 pages per business component) documenting what capability, method, or design was unknown at the outset, and why. The single most important document in an audit. - Project scope and technical objectives with measurable success criteria - Hypothesis statement identifying alternative approaches considered and the experimental strategy chosen

Tier 2: Contemporaneous Activity Records

  • Time tracking by employee by project (Jira, Asana, Linear, dedicated time tracker) - Technical development records - commit histories, model training logs, benchmark evaluation results, LLM evals, architecture decision records (ADRs) - Experiment logs showing what was tested, what failed, what was adjusted, why - the hallmark of a genuine POE - Meeting notes from technical reviews (Slack, email, Notion/Confluence)

Tier 3: Financial Records Linking Costs to Research

  • Cloud cost allocation reports - AWS Cost Explorer / Azure Cost Management exports segregating dev/training environments from production, tagged by project - API usage logs from provider dashboards allocated to specific projects, distinguishing experimental from production use - Contractor invoices and SOWs confirming work performed was qualified research, tied to specific projects - Payroll records with activity codes - W-2 wage data plus percentage-time allocations signed off by leads

Sample: RAG Chatbot Engagement

Retain records 5-7 years minimum (longer if credits are carried forward or applied against payroll taxes; the payroll-tax offset election stretches over 5 years of quarterly filings).

NJ R&D Credit

New Jersey offers a stand-alone R&D credit (N.J.S.A. 54:10A-5.24) that generally conforms to the federal four-part test structure with NJ-specific rate and modification differences. NJ's credit can be stacked with the federal credit. For a NJ-based agency, evaluate state credit alongside the federal claim each year - the additional documentation lift is small once federal records are in place.

FAQ

Does using OpenAI, Claude, or other LLM APIs disqualify our work from the R&D credit?

No. Routine use of off-the-shelf APIs without modification doesn't qualify on its own, but building on top of those APIs to resolve technical uncertainty does. Fine-tuning, RAG architecture, multi-agent orchestration, and custom integrations using LLM APIs as components can absolutely qualify if the four-part test is satisfied. Document the experimentation, not the API call.

Can my AI agency claim the R&D credit if our work was 100% client-funded?

Probably not. The funded-research exclusion under IRC Section 41(d)(4)(H) bars the credit when the client pays for all development, owns all IP, and pays regardless of outcome. To preserve credit eligibility, structure agreements to either (1) retain meaningful IP rights and license to the client, or (2) bear genuine financial risk via fixed-price contracts contingent on successful completion.

What's the payroll tax offset and can a small AI agency use it?

Yes - it's the most underused part of the credit for early-stage agencies. Qualified Small Businesses (under $5M gross receipts in the credit year, with no gross receipts in any year before the 5-year window) can elect to apply up to $500K of R&D credit per year against employer payroll taxes via Form 941. This converts a non-refundable income tax credit into immediate cash savings even when the agency has no income tax liability.

How is Section 174A different from Section 41?

Section 174A controls deductibility of R&E costs (immediate expensing for domestic, 15-year amortization for foreign). Section 41 is the research credit computed on top of those costs. They interact via Section 280C, which requires reducing the 174A deduction by the credit unless you elect a reduced credit rate. Most CPAs run both calculations to find the cleaner net result each year.

What is the IRS Section G disclosure?

Form 6765 Section G requires business-component-level breakdown of every claim: component name, type, software classification (commercial vs. internal-use), wage QREs by activity type, supplies/contract research/cloud allocated per component. Required for tax year 2025 and later. The IRS will cross-reference Section G against payroll, prior-year returns, and W-2 data. Inconsistencies trigger examination.

Can routine prompt engineering be claimed as R&D?

Routine prompt writing - configuring a chatbot with off-the-shelf behavior, adjusting tone, writing system prompts for known use cases - does not qualify. Systematic experimentation with prompting strategies as part of a broader research effort (few-shot evaluation, chain-of-thought architecture testing, benchmarked evaluation of competing paradigms to resolve capability uncertainty) can qualify if documented as part of a POE. Don't oversell routine work.

Do internal AI tools we built for our own agency qualify?

Only if they meet the High Threshold of Innovation test under Treas. Reg. Section 1.41-4(c)(6)(vii) - innovative, significant economic risk, not commercially available. Where possible, structure agency-developed AI tools as products for client deployment first; later internal use is then a separate business component subject to the IUS standard only for the post-repurposing modification.

Is foreign contractor work eligible?

No. Foreign contractor payments do not count as QREs under Section 41(b)(3). The 65% contract research expense rule applies only to U.S.-based contractors. Track domestic vs. foreign development separately - it matters both for QRE eligibility and for Section 174 vs. 174A treatment.

Practical 2026 Readiness Checklist

At project initiation: Draft a technical uncertainty memo per business component. Define measurable research objectives and alternative approaches. Open project-tagged cost centers in cloud accounts.

During active development: Maintain time-tracking entries linking developer hours to specific business components. Log experiment iterations - what was tested, what failed, what changed. Archive model training logs, benchmark results, ADRs. Tag and segregate development/testing API and cloud usage from production.

At project completion: Document experiment outcomes (failures are valuable POE evidence). Export and archive cloud cost allocation reports before invoices age. Prepare activity narratives per business component.

At tax year-end: Compute QREs under both RRC and ASC; pick the higher. Analyze Section 280C election under post-OBBBA rules. For QSBs (<$5M GR): evaluate payroll offset election on Form 6765. Complete Form 6765 Sections E, F, and G. Retain documentation 5-7 years minimum.

Ready to Evaluate Your AI Agency's R&D Credit?

The R&D credit is one of those tax incentives where the difference between a defensible six-figure claim and an audit-bait wishlist is documentation discipline. If your agency is doing real experimental work - RAG, agents, fine-tuning, custom integrations - and you're not yet claiming the credit, you're probably leaving money on the table. If you've been claiming casually without contemporaneous records, the George v. Commissioner signal is loud and clear.

I'm Greg Monaco, a NJ-licensed CPA (License #20CC04711400). I help AI automation agencies evaluate R&D credit eligibility, structure agreements to clear the funded-research exclusion, set up the documentation infrastructure that survives Tier 1 audits, and run the payroll tax offset election. Schedule a free 30-minute consultation.

Circular 230 Disclosure: This post provides general tax information and is not a substitute for personalized tax advice. Consult a qualified tax professional for advice specific to your situation.

AI agency cluster: AI Automation Agency Hub | API & Cloud Deductions (Section 174A) | NJ Sales Tax on Custom AI Chatbots | Foreign Contractor W-8BEN Compliance | S-Corp Salary for AI Agency Owners