Course 25 | Module 11 of 12

AI-Assisted Engineering Workflows and Governance

Use AI for structured assistance while preserving source trace, verification, human responsibility, and auditability.

MAP

Module map

Learning outcomes

  • Design bounded AI workflows for requirements and traceability review.
  • Verify AI-assisted evidence extraction, summarization, and log review.
  • Use AI for coding, hypotheses, and design-space support without delegating engineering judgment.
  • Create prompts, audit records, governance controls, and monitoring proportional to risk.

Evidence standard

Complete all four lessons, reproduce the worked checks, run the lab, and correct the weekly quiz. Treat AI output as candidate evidence until independently verified.

11.1

AI for requirements review and traceability suggestions

Why this lesson matters

Requirements and trace sets are large enough to benefit from machine assistance, but language similarity can create confident false links and invented intent.

Learning objectives

  • Define and distinguish Candidate link and Abstention.
  • Apply the lesson method to the worked ai for requirements review and traceability suggestions case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

AI may flag quality defects, propose questions, and rank candidate trace links. Controlled requirements and artifacts remain authoritative, links remain unapproved until reviewed, and every suggestion needs source identifiers and rationale.

Key concepts

Candidate linkA proposed relationship awaiting semantic and configuration review.
AbstentionAn explicit decision not to answer when evidence or confidence is insufficient.
False positiveA suggested issue or link that reviewers determine is not valid.
False negativeA real issue or link the system fails to suggest.

Step-by-step explanation

  1. Define allowed inputs, output schema, link types, and prohibited actions.
  2. Provide controlled artifact text and identifiers with minimal necessary context.
  3. Require defect or link rationale tied to exact source spans.
  4. Route candidates to responsible reviewers and record accept, reject, modify, or abstain.
  5. Measure precision, recall on sampled ground truth, reviewer effort, and failure patterns.

Worked example

Requirement R-14 says 'Maintain coolant below 60 °C.' Evidence E-9 is a pump vibration test. Both contain the phrase 'normal operation,' and an AI suggests E-9 verifies R-14.

  1. 1

    Inspect the target quantity and method: coolant temperature versus pump vibration.

  2. 2

    Shared operational language is not semantic verification evidence.

  3. 3

    Reject the link and record false-positive cause as lexical overlap without quantity alignment.

  4. 4

    Use the example to improve review prioritization and schema fields, not to assume a prompt alone eliminates recurrence.

Result. Disposition: rejected candidate. A valid link would require temperature evidence under the defined operating condition and configuration.

Independent check. Every accepted link aligns quantity, condition, configuration, method, and link semantics.

Common misconceptions

MisconceptionCorrection
Semantic similarity proves traceabilitySimilar words may refer to different quantities, states, configurations, or evidence roles.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

What should happen to low-confidence links?

They should be abstained from or prioritized for review, never silently accepted.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Review ten candidate links and record accept or reject with one-sentence rationale.

Intermediate

Design an output schema that forces source spans and abstention.

Advanced

Plan a precision-recall and reviewer-effort evaluation for deployment.

AI-assisted engineering task

Run a candidate-link task with stable IDs, typed relationships, quoted source spans, rationale, and an abstain option.

How to prove the AI output yourself

  1. Open source and target.
  2. Check quantity, condition, configuration, and direction.
  3. Record reviewer and disposition.
  4. Measure missed and spurious links on a reviewed sample.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Candidate link.

A proposed relationship awaiting semantic and configuration review.

What role does Abstention play here?

An explicit decision not to answer when evidence or confidence is insufficient.

What must a reviewer be able to reconstruct?

Every accepted link aligns quantity, condition, configuration, method, and link semantics.

End-of-lesson summary

AI may flag quality defects, propose questions, and rank candidate trace links. Controlled requirements and artifacts remain authoritative, links remain unapproved until reviewed, and every suggestion needs source identifiers and rationale.

Student notes

Never paste a suggested link directly into the authoritative graph. Create a candidate state and a reviewed disposition.

Recommended readings

Instructor notes

Include a deliberately tempting lexical match. Require students to explain false-positive mechanism.

11.2

AI for summarization, evidence extraction, and simulation-log review

Why this lesson matters

Summaries can omit caveats, merge configurations, and turn absence of evidence into a positive claim.

Learning objectives

  • Define and distinguish Evidence extraction and Grounded summary.
  • Apply the lesson method to the worked ai for summarization, evidence extraction, and simulation-log review case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Use extraction before synthesis: preserve source identifiers, exact fields, units, uncertainty, status, and explicit unknowns. Summaries should be generated from verified structured facts and linked back to evidence.

Key concepts

Evidence extractionIdentification of specified facts and metadata from source artifacts without adding unsupported meaning.
Grounded summaryA synthesis whose claims map to accessible source evidence.
Unsupported inferenceA conclusion not established by the supplied source.
Log anomalyA warning, failure, divergence, override, or unusual condition requiring review.

Step-by-step explanation

  1. Define a schema with required fields, null behavior, units, and source locations.
  2. Extract facts separately from interpretation.
  3. Validate fields against originals and reconcile conflicting artifacts.
  4. Generate summaries that carry citations and uncertainty.
  5. Sample outputs continuously and preserve logs, corrections, and model or prompt versions.

Worked example

A solver log contains 'converged residual 9e-5,' 'two distorted elements,' and 'mass scaling enabled after step 40.' An AI summary says 'The simulation converged successfully with no issues.'

  1. 1

    Reject the summary because it omits two material credibility conditions.

  2. 2

    Extract convergence threshold, distorted-element warning, and mass-scaling activation with line references.

  3. 3

    Determine whether each affects the quantity of interest through engineering review and reruns.

  4. 4

    Revise the summary to separate solver termination from model-quality concerns.

Result. A defensible summary states that the solver met its residual criterion while element distortion and late mass scaling require disposition.

Independent check. Every claim has a source span, omissions are tested, and 'converged' is not equated with physically credible.

Common misconceptions

MisconceptionCorrection
A concise summary is a faithful summaryCompression can remove the exact caveat that controls the decision.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

Does solver convergence prove credibility?

No. It addresses a numerical termination condition, not model setup, discretization, physics, uncertainty, or validation.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Extract facts, warnings, and unknowns from a ten-line solver log.

Intermediate

Compare source-linked and free-form summaries for omission risk.

Advanced

Design a review policy that samples both accepted outputs and abstentions for hidden false negatives.

AI-assisted engineering task

Ask AI for a structured log extraction with categories: termination, warning, override, quality metric, missing information, and source line.

How to prove the AI output yourself

  1. Search the original for warnings and negations.
  2. Verify numeric fields and units.
  3. Re-run critical checks.
  4. Have the simulation owner approve interpretation.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Evidence extraction.

Identification of specified facts and metadata from source artifacts without adding unsupported meaning.

What role does Grounded summary play here?

A synthesis whose claims map to accessible source evidence.

What must a reviewer be able to reconstruct?

Every claim has a source span, omissions are tested, and 'converged' is not equated with physically credible.

End-of-lesson summary

Use extraction before synthesis: preserve source identifiers, exact fields, units, uncertainty, status, and explicit unknowns. Summaries should be generated from verified structured facts and linked back to evidence.

Student notes

Keep extracted facts, engineering interpretation, and decision summary as separate linked artifacts.

Recommended readings

Instructor notes

Use logs where 'success' coexists with warnings. This mirrors real software behavior.

11.3

AI for coding, hypotheses, and design-space exploration

Why this lesson matters

Generated code may run while implementing the wrong equation, units, boundary condition, or optimization constraint.

Learning objectives

  • Define and distinguish Executable specification and Property-based test.
  • Apply the lesson method to the worked ai for coding, hypotheses, and design-space exploration case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Use AI to accelerate boilerplate, tests, visualization, and candidate hypotheses. Engineering acceptance requires specification, unit tests, analytical cases, data provenance, code review, and physical interpretation.

Key concepts

Executable specificationTests and examples that make intended behavior checkable.
Property-based testA test of general invariants such as conservation, monotonicity, symmetry, or bounds.
HypothesisA testable proposed explanation, not a conclusion.
Design-space assistantA tool that organizes alternatives and evidence without owning selection.

Step-by-step explanation

  1. Write equations, units, assumptions, and acceptance tests before requesting code.
  2. Ask for small functions with typed inputs and explicit failure behavior.
  3. Test against analytical values, dimensional properties, limits, and independent implementations.
  4. Use AI-generated hypotheses to design discriminating tests.
  5. Record accepted modifications, reviewer, environment, and result evidence.

Worked example

AI generates beam inertia as I = b h²/12 instead of b h³/12. The code runs and optimization returns a lightweight design.

  1. 1

    Dimensional analysis shows b h² has units m³, not the required m⁴.

  2. 2

    An analytical unit test for b = 0.04 m and h = 0.03 m expects I = 9.00e-8 m⁴.

  3. 3

    Reject all derived optimization results, correct the function, rerun tests, and assess whether any decision used the defect.

  4. 4

    Record the incident and add a dimensional or symbolic check to prevent recurrence.

Result. Running code is not verified code. A one-line dimensional check catches a high-consequence implementation error before design release.

Independent check. Equation source, dimensions, known case, limits, constraints, and independent calculation all agree.

Common misconceptions

MisconceptionCorrection
Passing tests prove the code correctTests cover specified cases and properties; missing or wrong specifications can still pass.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

What makes a useful hypothesis?

It is physically plausible, distinct from alternatives, and testable with evidence that could refute it.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Write three unit tests for rectangular-section inertia.

Intermediate

Design property tests for monotonic beam deflection.

Advanced

Use residual trends to generate three hypotheses and one discriminating experiment for each.

AI-assisted engineering task

Ask AI to implement only after supplying equations, units, domain, examples, and tests. Request explanation of assumptions and failure cases.

How to prove the AI output yourself

  1. Run static and unit checks.
  2. Perform dimensional analysis.
  3. Compare analytical cases.
  4. Review code and generated dependencies.
  5. Reproduce the final run in a controlled environment.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Executable specification.

Tests and examples that make intended behavior checkable.

What role does Property-based test play here?

A test of general invariants such as conservation, monotonicity, symmetry, or bounds.

What must a reviewer be able to reconstruct?

Equation source, dimensions, known case, limits, constraints, and independent calculation all agree.

End-of-lesson summary

Use AI to accelerate boilerplate, tests, visualization, and candidate hypotheses. Engineering acceptance requires specification, unit tests, analytical cases, data provenance, code review, and physical interpretation.

Student notes

No AI-generated engineering function enters a decision workflow without equation source, unit contract, tests, review, and version record.

Recommended readings

Instructor notes

Include an error that produces plausible values. Students should not rely on absurd-output detection.

11.4

Human oversight, prompting, audit trails, and AI governance

Why this lesson matters

A vague human-in-the-loop claim is not a control. Effective oversight requires authority, competence, evidence access, time, and monitoring.

Learning objectives

  • Define and distinguish AI risk and Human oversight.
  • Apply the lesson method to the worked human oversight, prompting, audit trails, and ai governance case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Govern AI use through task classification, approved sources, roles, output status, verification, incident handling, monitoring, and change control. Prompt design helps shape behavior but does not replace technical and organizational controls.

Key concepts

AI riskPotential harm or loss arising from AI behavior, use context, interaction, or organizational process.
Human oversightCompetent, empowered review and intervention across AI use.
Audit recordTrace of task, inputs, model or tool, output, checks, edits, reviewer, disposition, and downstream use.
MonitoringOngoing measurement of performance, drift, incidents, overrides, and changing context.

Step-by-step explanation

  1. Classify task, users, affected decisions, consequence, and prohibited uses.
  2. Map data authority, confidentiality, intellectual property, and retention constraints.
  3. Specify output schema, abstention, source trace, and verification workflow.
  4. Assign accountable owner, qualified reviewer, escalation, and incident process.
  5. Monitor error rates, reviewer overrides, model changes, and downstream outcomes.

Worked example

An AI trace assistant has 92% precision in a pilot. A model update is deployed, and precision on safety requirements falls to 71% while overall acceptance rate remains high.

  1. 1

    Treat model or workflow version change as a controlled change requiring re-evaluation.

  2. 2

    Segment metrics by requirement criticality; overall acceptance can hide safety degradation.

  3. 3

    Pause automatic prioritization for affected classes, notify owners, and review recent accepted links.

  4. 4

    Document incident, root cause, corrective action, and criteria for safe resumption.

Result. Monitoring detects a consequential regression that a one-time pilot and broad average would miss.

Independent check. Version, test set, segmentation, thresholds, rollback authority, affected decisions, and corrective actions are recorded.

Common misconceptions

MisconceptionCorrection
A good prompt is the safety systemPrompts are one control layer; grounding, permissions, tests, review, monitoring, and governance remain necessary.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

What makes oversight meaningful?

The reviewer has competence, access, time, responsibility, and authority to reject or stop use.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

List the minimum fields in an AI audit record.

Intermediate

Design role and escalation controls for an evidence-extraction assistant.

Advanced

Create monitoring thresholds segmented by task, consequence, and source type, including a rollback rule.

AI-assisted engineering task

Write prompts with role, bounded task, authoritative inputs, output schema, forbidden inference, abstention, and verification request. Store prompt version with the audit record.

How to prove the AI output yourself

  1. Evaluate on representative and adversarial cases.
  2. Inspect citations and omissions.
  3. Measure reviewer overrides.
  4. Reassess after model, prompt, source, or task changes.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define AI risk.

Potential harm or loss arising from AI behavior, use context, interaction, or organizational process.

What role does Human oversight play here?

Competent, empowered review and intervention across AI use.

What must a reviewer be able to reconstruct?

Version, test set, segmentation, thresholds, rollback authority, affected decisions, and corrective actions are recorded.

End-of-lesson summary

Govern AI use through task classification, approved sources, roles, output status, verification, incident handling, monitoring, and change control. Prompt design helps shape behavior but does not replace technical and organizational controls.

Student notes

Before AI use, answer: who is accountable, what may it access, what may it output, how is it checked, when must it abstain, and how is drift detected?

Recommended readings

Instructor notes

Treat ISO/IEC 42001 and the EU AI Act as governance and legal awareness, not as a classroom certification claim or legal advice.

LAB 11

Lab 11: Use AI to suggest trace links, then verify manually

Lab objective

Evaluate candidate links produced by an AI-style similarity assistant and record reviewer disposition, rationale, and performance.

Engineering context

No API is required. A supplied candidate list simulates model output so the lab focuses on engineering verification.

Input data

  • Four requirements
  • Five evidence titles
  • Six candidate links with scores

Step-by-step task

  1. Review quantity and condition semantics
  2. Accept or reject each link
  3. Compute precision against the reviewed set
  4. Document false-positive mechanism and missed-link search

Python code

requirements = {
    "R-TEMP": "Coolant outlet temperature shall not exceed 60 degC at 500 W.",
    "R-VIB": "Pump housing RMS vibration shall not exceed 2.5 mm/s at rated speed.",
    "R-MASS": "Cooling assembly mass shall not exceed 4.0 kg.",
    "R-LEAK": "No visible leakage is permitted during the 8 bar proof test.",
}
evidence = {
    "E-TEMP": "500 W thermal balance test, outlet temperature",
    "E-VIB": "Rated-speed pump vibration test",
    "E-MASS": "Released assembly mass inspection",
    "E-LEAK": "8 bar pressure proof and visual inspection",
    "E-NOISE": "Acoustic test during normal operation",
}
candidates = [
    ("R-TEMP", "E-TEMP", 0.93), ("R-TEMP", "E-VIB", 0.62),
    ("R-VIB", "E-VIB", 0.95), ("R-VIB", "E-NOISE", 0.76),
    ("R-MASS", "E-MASS", 0.97), ("R-LEAK", "E-LEAK", 0.94),
]
# Manual engineering review after inspecting source and target:
accepted = {("R-TEMP", "E-TEMP"), ("R-VIB", "E-VIB"),
            ("R-MASS", "E-MASS"), ("R-LEAK", "E-LEAK")}
tp = sum((r, e) in accepted for r, e, score in candidates)
precision = tp / len(candidates)
for r, e, score in candidates:
    disposition = "accept" if (r, e) in accepted else "reject"
    print(r, e, score, disposition)
print(f"reviewed precision={precision:.3f}")

Explanation of code

Step 1 review quantity and condition semantics Step 2 accept or reject each link Step 3 compute precision against the reviewed set Step 4 document false-positive mechanism and missed-link search

Expected output

Four accepted links, two rejected lexical or contextual matches, and reviewed precision 0.667 for the supplied candidates.

Interpretation

Precision alone does not measure missed links. Reviewers must also search for false negatives and segment performance by consequence.

Common errors

  • Accepting by score threshold alone
  • Reviewing titles without procedures and results
  • Ignoring candidate links the model never produced

Extension tasks

  • Add recall using a reviewed gold set
  • Require source spans
  • Evaluate a changed prompt or model version
  • Store reviewer disagreement

Reflection questions

  • Why were E-VIB and E-NOISE rejected for their proposed targets?
  • What evidence is needed to measure recall?
  • How should safety-critical links change the workflow?
PROJECT

Mini-project 4: Auditable AI-assisted evidence review

Deliverable

A bounded AI task, controlled source packet, output schema, candidate outputs, manual verification record, performance analysis, governance controls, and incident scenario.

Required checks

Source IDs, abstention, accepted and rejected examples, false-positive and false-negative analysis, accountable reviewer, and model-change monitoring plan.

WEEK 11

Weekly quiz and concept check

Closed notes. Answer each item, then use the key to correct in a different color.

  1. What is the initial status of an AI trace link?
  2. Why extract before summarize?
  3. Does solver convergence establish credibility?
  4. What must precede AI-generated engineering code?
  5. What makes human oversight effective?
  6. What changes trigger AI re-evaluation?
Answer key
  1. 1. Candidate and unapproved.
  2. 2. Structured extraction preserves source fields and exposes missing information before synthesis.
  3. 3. No. It is one numerical condition and does not establish physical adequacy or uncertainty.
  4. 4. Equations, units, domain, assumptions, tests, and acceptance criteria.
  5. 5. Competence, evidence access, time, accountability, and authority to reject or stop.
  6. 6. Model, prompt, source corpus, schema, task, user population, or consequence changes.
SOURCES

Module source map

SourceHow it is used
NIST AI Risk Management Framework 1.0Govern, Map, Measure, and Manage functions for trustworthy and responsible AI risk management.
ISO/IEC 42001:2023, AI Management SystemsOrganizational AI governance, accountability, policy, controls, and continual improvement awareness.
European Commission overview of the EU AI ActRisk-based legal awareness for AI developers and deployers in the European Union.
DoDI 5000.97, Digital EngineeringOperational definitions of digital engineering, digital models, digital artifacts, authoritative data, test, and sustainment.
NASA-STD-7009, Standard for Models and SimulationsModel and simulation lifecycle, credibility products, acceptance criteria, and reporting. NASA-STD-7009B supersedes 7009A.

Access labels and full-course source notes are on the course home page. Paywalled standards are not paraphrased as if their full text were accessed.