11.1
AI for requirements review and traceability suggestions
Why this lesson matters
Requirements and trace sets are large enough to benefit from machine assistance, but language similarity can create confident false links and invented intent.
Learning objectives
- Define and distinguish Candidate link and Abstention.
- Apply the lesson method to the worked ai for requirements review and traceability suggestions case.
- Evaluate evidence, uncertainty, and AI-assisted output before making a claim.
Readiness check
Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.
Check your response
A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.
Core idea
AI may flag quality defects, propose questions, and rank candidate trace links. Controlled requirements and artifacts remain authoritative, links remain unapproved until reviewed, and every suggestion needs source identifiers and rationale.
Key concepts
| Candidate link | A proposed relationship awaiting semantic and configuration review. |
|---|
| Abstention | An explicit decision not to answer when evidence or confidence is insufficient. |
|---|
| False positive | A suggested issue or link that reviewers determine is not valid. |
|---|
| False negative | A real issue or link the system fails to suggest. |
|---|
Step-by-step explanation
- Define allowed inputs, output schema, link types, and prohibited actions.
- Provide controlled artifact text and identifiers with minimal necessary context.
- Require defect or link rationale tied to exact source spans.
- Route candidates to responsible reviewers and record accept, reject, modify, or abstain.
- Measure precision, recall on sampled ground truth, reviewer effort, and failure patterns.
Worked example
Requirement R-14 says 'Maintain coolant below 60 °C.' Evidence E-9 is a pump vibration test. Both contain the phrase 'normal operation,' and an AI suggests E-9 verifies R-14.
- 1
Inspect the target quantity and method: coolant temperature versus pump vibration.
- 2
Shared operational language is not semantic verification evidence.
- 3
Reject the link and record false-positive cause as lexical overlap without quantity alignment.
- 4
Use the example to improve review prioritization and schema fields, not to assume a prompt alone eliminates recurrence.
Result. Disposition: rejected candidate. A valid link would require temperature evidence under the defined operating condition and configuration.
Independent check. Every accepted link aligns quantity, condition, configuration, method, and link semantics.
Common misconceptions
| Misconception | Correction |
|---|
| Semantic similarity proves traceability | Similar words may refer to different quantities, states, configurations, or evidence roles. |
|---|
| A tool output closes the question | A result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked. |
|---|
Diagnostic questions
What should happen to low-confidence links?
They should be abstained from or prioritized for review, never silently accepted.
What would make this work reproducible?
Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.
Practice ladder
BasicReview ten candidate links and record accept or reject with one-sentence rationale.
IntermediateDesign an output schema that forces source spans and abstention.
AdvancedPlan a precision-recall and reviewer-effort evaluation for deployment.
AI-assisted engineering task
Run a candidate-link task with stable IDs, typed relationships, quoted source spans, rationale, and an abstain option.
How to prove the AI output yourself
- Open source and target.
- Check quantity, condition, configuration, and direction.
- Record reviewer and disposition.
- Measure missed and spurious links on a reviewed sample.
Retrieval and spaced review
Answer closed-notes today, then again after 1, 3, 7, and 30 days.
Define Candidate link.
A proposed relationship awaiting semantic and configuration review.
What role does Abstention play here?
An explicit decision not to answer when evidence or confidence is insufficient.
What must a reviewer be able to reconstruct?
Every accepted link aligns quantity, condition, configuration, method, and link semantics.
End-of-lesson summary
AI may flag quality defects, propose questions, and rank candidate trace links. Controlled requirements and artifacts remain authoritative, links remain unapproved until reviewed, and every suggestion needs source identifiers and rationale.
Student notes
Never paste a suggested link directly into the authoritative graph. Create a candidate state and a reviewed disposition.
Recommended readings
Instructor notes
Include a deliberately tempting lexical match. Require students to explain false-positive mechanism.
11.2
AI for summarization, evidence extraction, and simulation-log review
Why this lesson matters
Summaries can omit caveats, merge configurations, and turn absence of evidence into a positive claim.
Learning objectives
- Define and distinguish Evidence extraction and Grounded summary.
- Apply the lesson method to the worked ai for summarization, evidence extraction, and simulation-log review case.
- Evaluate evidence, uncertainty, and AI-assisted output before making a claim.
Readiness check
Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.
Check your response
A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.
Core idea
Use extraction before synthesis: preserve source identifiers, exact fields, units, uncertainty, status, and explicit unknowns. Summaries should be generated from verified structured facts and linked back to evidence.
Key concepts
| Evidence extraction | Identification of specified facts and metadata from source artifacts without adding unsupported meaning. |
|---|
| Grounded summary | A synthesis whose claims map to accessible source evidence. |
|---|
| Unsupported inference | A conclusion not established by the supplied source. |
|---|
| Log anomaly | A warning, failure, divergence, override, or unusual condition requiring review. |
|---|
Step-by-step explanation
- Define a schema with required fields, null behavior, units, and source locations.
- Extract facts separately from interpretation.
- Validate fields against originals and reconcile conflicting artifacts.
- Generate summaries that carry citations and uncertainty.
- Sample outputs continuously and preserve logs, corrections, and model or prompt versions.
Worked example
A solver log contains 'converged residual 9e-5,' 'two distorted elements,' and 'mass scaling enabled after step 40.' An AI summary says 'The simulation converged successfully with no issues.'
- 1
Reject the summary because it omits two material credibility conditions.
- 2
Extract convergence threshold, distorted-element warning, and mass-scaling activation with line references.
- 3
Determine whether each affects the quantity of interest through engineering review and reruns.
- 4
Revise the summary to separate solver termination from model-quality concerns.
Result. A defensible summary states that the solver met its residual criterion while element distortion and late mass scaling require disposition.
Independent check. Every claim has a source span, omissions are tested, and 'converged' is not equated with physically credible.
Common misconceptions
| Misconception | Correction |
|---|
| A concise summary is a faithful summary | Compression can remove the exact caveat that controls the decision. |
|---|
| A tool output closes the question | A result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked. |
|---|
Diagnostic questions
Does solver convergence prove credibility?
No. It addresses a numerical termination condition, not model setup, discretization, physics, uncertainty, or validation.
What would make this work reproducible?
Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.
Practice ladder
BasicExtract facts, warnings, and unknowns from a ten-line solver log.
IntermediateCompare source-linked and free-form summaries for omission risk.
AdvancedDesign a review policy that samples both accepted outputs and abstentions for hidden false negatives.
AI-assisted engineering task
Ask AI for a structured log extraction with categories: termination, warning, override, quality metric, missing information, and source line.
How to prove the AI output yourself
- Search the original for warnings and negations.
- Verify numeric fields and units.
- Re-run critical checks.
- Have the simulation owner approve interpretation.
Retrieval and spaced review
Answer closed-notes today, then again after 1, 3, 7, and 30 days.
Define Evidence extraction.
Identification of specified facts and metadata from source artifacts without adding unsupported meaning.
What role does Grounded summary play here?
A synthesis whose claims map to accessible source evidence.
What must a reviewer be able to reconstruct?
Every claim has a source span, omissions are tested, and 'converged' is not equated with physically credible.
End-of-lesson summary
Use extraction before synthesis: preserve source identifiers, exact fields, units, uncertainty, status, and explicit unknowns. Summaries should be generated from verified structured facts and linked back to evidence.
Student notes
Keep extracted facts, engineering interpretation, and decision summary as separate linked artifacts.
Recommended readings
Instructor notes
Use logs where 'success' coexists with warnings. This mirrors real software behavior.
11.3
AI for coding, hypotheses, and design-space exploration
Why this lesson matters
Generated code may run while implementing the wrong equation, units, boundary condition, or optimization constraint.
Learning objectives
- Define and distinguish Executable specification and Property-based test.
- Apply the lesson method to the worked ai for coding, hypotheses, and design-space exploration case.
- Evaluate evidence, uncertainty, and AI-assisted output before making a claim.
Readiness check
Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.
Check your response
A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.
Core idea
Use AI to accelerate boilerplate, tests, visualization, and candidate hypotheses. Engineering acceptance requires specification, unit tests, analytical cases, data provenance, code review, and physical interpretation.
Key concepts
| Executable specification | Tests and examples that make intended behavior checkable. |
|---|
| Property-based test | A test of general invariants such as conservation, monotonicity, symmetry, or bounds. |
|---|
| Hypothesis | A testable proposed explanation, not a conclusion. |
|---|
| Design-space assistant | A tool that organizes alternatives and evidence without owning selection. |
|---|
Step-by-step explanation
- Write equations, units, assumptions, and acceptance tests before requesting code.
- Ask for small functions with typed inputs and explicit failure behavior.
- Test against analytical values, dimensional properties, limits, and independent implementations.
- Use AI-generated hypotheses to design discriminating tests.
- Record accepted modifications, reviewer, environment, and result evidence.
Worked example
AI generates beam inertia as I = b h²/12 instead of b h³/12. The code runs and optimization returns a lightweight design.
- 1
Dimensional analysis shows b h² has units m³, not the required m⁴.
- 2
An analytical unit test for b = 0.04 m and h = 0.03 m expects I = 9.00e-8 m⁴.
- 3
Reject all derived optimization results, correct the function, rerun tests, and assess whether any decision used the defect.
- 4
Record the incident and add a dimensional or symbolic check to prevent recurrence.
Result. Running code is not verified code. A one-line dimensional check catches a high-consequence implementation error before design release.
Independent check. Equation source, dimensions, known case, limits, constraints, and independent calculation all agree.
Common misconceptions
| Misconception | Correction |
|---|
| Passing tests prove the code correct | Tests cover specified cases and properties; missing or wrong specifications can still pass. |
|---|
| A tool output closes the question | A result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked. |
|---|
Diagnostic questions
What makes a useful hypothesis?
It is physically plausible, distinct from alternatives, and testable with evidence that could refute it.
What would make this work reproducible?
Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.
Practice ladder
BasicWrite three unit tests for rectangular-section inertia.
IntermediateDesign property tests for monotonic beam deflection.
AdvancedUse residual trends to generate three hypotheses and one discriminating experiment for each.
AI-assisted engineering task
Ask AI to implement only after supplying equations, units, domain, examples, and tests. Request explanation of assumptions and failure cases.
How to prove the AI output yourself
- Run static and unit checks.
- Perform dimensional analysis.
- Compare analytical cases.
- Review code and generated dependencies.
- Reproduce the final run in a controlled environment.
Retrieval and spaced review
Answer closed-notes today, then again after 1, 3, 7, and 30 days.
Define Executable specification.
Tests and examples that make intended behavior checkable.
What role does Property-based test play here?
A test of general invariants such as conservation, monotonicity, symmetry, or bounds.
What must a reviewer be able to reconstruct?
Equation source, dimensions, known case, limits, constraints, and independent calculation all agree.
End-of-lesson summary
Use AI to accelerate boilerplate, tests, visualization, and candidate hypotheses. Engineering acceptance requires specification, unit tests, analytical cases, data provenance, code review, and physical interpretation.
Student notes
No AI-generated engineering function enters a decision workflow without equation source, unit contract, tests, review, and version record.
Recommended readings
Instructor notes
Include an error that produces plausible values. Students should not rely on absurd-output detection.
11.4
Human oversight, prompting, audit trails, and AI governance
Why this lesson matters
A vague human-in-the-loop claim is not a control. Effective oversight requires authority, competence, evidence access, time, and monitoring.
Learning objectives
- Define and distinguish AI risk and Human oversight.
- Apply the lesson method to the worked human oversight, prompting, audit trails, and ai governance case.
- Evaluate evidence, uncertainty, and AI-assisted output before making a claim.
Readiness check
Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.
Check your response
A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.
Core idea
Govern AI use through task classification, approved sources, roles, output status, verification, incident handling, monitoring, and change control. Prompt design helps shape behavior but does not replace technical and organizational controls.
Key concepts
| AI risk | Potential harm or loss arising from AI behavior, use context, interaction, or organizational process. |
|---|
| Human oversight | Competent, empowered review and intervention across AI use. |
|---|
| Audit record | Trace of task, inputs, model or tool, output, checks, edits, reviewer, disposition, and downstream use. |
|---|
| Monitoring | Ongoing measurement of performance, drift, incidents, overrides, and changing context. |
|---|
Step-by-step explanation
- Classify task, users, affected decisions, consequence, and prohibited uses.
- Map data authority, confidentiality, intellectual property, and retention constraints.
- Specify output schema, abstention, source trace, and verification workflow.
- Assign accountable owner, qualified reviewer, escalation, and incident process.
- Monitor error rates, reviewer overrides, model changes, and downstream outcomes.
Worked example
An AI trace assistant has 92% precision in a pilot. A model update is deployed, and precision on safety requirements falls to 71% while overall acceptance rate remains high.
- 1
Treat model or workflow version change as a controlled change requiring re-evaluation.
- 2
Segment metrics by requirement criticality; overall acceptance can hide safety degradation.
- 3
Pause automatic prioritization for affected classes, notify owners, and review recent accepted links.
- 4
Document incident, root cause, corrective action, and criteria for safe resumption.
Result. Monitoring detects a consequential regression that a one-time pilot and broad average would miss.
Independent check. Version, test set, segmentation, thresholds, rollback authority, affected decisions, and corrective actions are recorded.
Common misconceptions
| Misconception | Correction |
|---|
| A good prompt is the safety system | Prompts are one control layer; grounding, permissions, tests, review, monitoring, and governance remain necessary. |
|---|
| A tool output closes the question | A result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked. |
|---|
Diagnostic questions
What makes oversight meaningful?
The reviewer has competence, access, time, responsibility, and authority to reject or stop use.
What would make this work reproducible?
Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.
Practice ladder
BasicList the minimum fields in an AI audit record.
IntermediateDesign role and escalation controls for an evidence-extraction assistant.
AdvancedCreate monitoring thresholds segmented by task, consequence, and source type, including a rollback rule.
AI-assisted engineering task
Write prompts with role, bounded task, authoritative inputs, output schema, forbidden inference, abstention, and verification request. Store prompt version with the audit record.
How to prove the AI output yourself
- Evaluate on representative and adversarial cases.
- Inspect citations and omissions.
- Measure reviewer overrides.
- Reassess after model, prompt, source, or task changes.
Retrieval and spaced review
Answer closed-notes today, then again after 1, 3, 7, and 30 days.
Define AI risk.
Potential harm or loss arising from AI behavior, use context, interaction, or organizational process.
What role does Human oversight play here?
Competent, empowered review and intervention across AI use.
What must a reviewer be able to reconstruct?
Version, test set, segmentation, thresholds, rollback authority, affected decisions, and corrective actions are recorded.
End-of-lesson summary
Govern AI use through task classification, approved sources, roles, output status, verification, incident handling, monitoring, and change control. Prompt design helps shape behavior but does not replace technical and organizational controls.
Student notes
Before AI use, answer: who is accountable, what may it access, what may it output, how is it checked, when must it abstain, and how is drift detected?
Recommended readings
Instructor notes
Treat ISO/IEC 42001 and the EU AI Act as governance and legal awareness, not as a classroom certification claim or legal advice.
LAB 11
Lab 11: Use AI to suggest trace links, then verify manually
Lab objective
Evaluate candidate links produced by an AI-style similarity assistant and record reviewer disposition, rationale, and performance.
Engineering context
No API is required. A supplied candidate list simulates model output so the lab focuses on engineering verification.
Input data
- Four requirements
- Five evidence titles
- Six candidate links with scores
Step-by-step task
- Review quantity and condition semantics
- Accept or reject each link
- Compute precision against the reviewed set
- Document false-positive mechanism and missed-link search
Python code
requirements = {
"R-TEMP": "Coolant outlet temperature shall not exceed 60 degC at 500 W.",
"R-VIB": "Pump housing RMS vibration shall not exceed 2.5 mm/s at rated speed.",
"R-MASS": "Cooling assembly mass shall not exceed 4.0 kg.",
"R-LEAK": "No visible leakage is permitted during the 8 bar proof test.",
}
evidence = {
"E-TEMP": "500 W thermal balance test, outlet temperature",
"E-VIB": "Rated-speed pump vibration test",
"E-MASS": "Released assembly mass inspection",
"E-LEAK": "8 bar pressure proof and visual inspection",
"E-NOISE": "Acoustic test during normal operation",
}
candidates = [
("R-TEMP", "E-TEMP", 0.93), ("R-TEMP", "E-VIB", 0.62),
("R-VIB", "E-VIB", 0.95), ("R-VIB", "E-NOISE", 0.76),
("R-MASS", "E-MASS", 0.97), ("R-LEAK", "E-LEAK", 0.94),
]
# Manual engineering review after inspecting source and target:
accepted = {("R-TEMP", "E-TEMP"), ("R-VIB", "E-VIB"),
("R-MASS", "E-MASS"), ("R-LEAK", "E-LEAK")}
tp = sum((r, e) in accepted for r, e, score in candidates)
precision = tp / len(candidates)
for r, e, score in candidates:
disposition = "accept" if (r, e) in accepted else "reject"
print(r, e, score, disposition)
print(f"reviewed precision={precision:.3f}")
Explanation of code
Step 1 review quantity and condition semantics Step 2 accept or reject each link Step 3 compute precision against the reviewed set Step 4 document false-positive mechanism and missed-link search
Expected output
Four accepted links, two rejected lexical or contextual matches, and reviewed precision 0.667 for the supplied candidates.
Interpretation
Precision alone does not measure missed links. Reviewers must also search for false negatives and segment performance by consequence.
Common errors
- Accepting by score threshold alone
- Reviewing titles without procedures and results
- Ignoring candidate links the model never produced
Extension tasks
- Add recall using a reviewed gold set
- Require source spans
- Evaluate a changed prompt or model version
- Store reviewer disagreement
Reflection questions
- Why were E-VIB and E-NOISE rejected for their proposed targets?
- What evidence is needed to measure recall?
- How should safety-critical links change the workflow?
PROJECT
Mini-project 4: Auditable AI-assisted evidence review
Deliverable
A bounded AI task, controlled source packet, output schema, candidate outputs, manual verification record, performance analysis, governance controls, and incident scenario.
Required checks
Source IDs, abstention, accepted and rejected examples, false-positive and false-negative analysis, accountable reviewer, and model-change monitoring plan.