Course 25 | Module 8 of 12

Measurement Data and Simulation Evidence

Turn calibrated observations and experiments into uncertainty-aware evidence that can enter a digital thread.

MAP

Module map

Learning outcomes

  • Trace a measurand through sensor, conditioning, acquisition, processing, and reported result.
  • Estimate and report a simple measurement uncertainty budget.
  • Plan experiments and data-quality controls for model comparison.
  • Compare simulation and measurement without ignoring alignment, uncertainty, or intended use.

Evidence standard

Complete all four lessons, reproduce the worked checks, run the lab, and correct the weekly quiz. Treat AI output as candidate evidence until independently verified.

8.1

Measurement chains, calibration, and sensor uncertainty

Why this lesson matters

A recorded number is the end of a measurement chain, not direct access to the true physical quantity.

Learning objectives

  • Define and distinguish Measurand and Calibration.
  • Apply the lesson method to the worked measurement chains, calibration, and sensor uncertainty case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Define the measurand, measurement model, calibration, environmental influences, sampling, processing, and uncertainty contributors. Traceability means a documented calibration chain to references with stated uncertainty, not merely a serial number sticker.

Key concepts

MeasurandThe quantity intended to be measured, defined with enough specificity to avoid ambiguity.
CalibrationEstablishing the relation between instrument indication and reference values under stated conditions.
Measurement modelThe relation used to obtain the result from indications, corrections, and influence quantities.
Combined standard uncertaintyA standard-deviation-like combination of quantified uncertainty components under stated assumptions.

Step-by-step explanation

  1. Define quantity, location, time, state, and operating conditions.
  2. Map sensor, mounting, conditioning, acquisition, timing, and processing.
  3. Apply calibration coefficients and corrections with units and validity dates.
  4. Quantify repeatability, calibration, resolution, drift, environment, and processing effects.
  5. Report result, standard or expanded uncertainty, coverage basis, and limitations.

Worked example

A thermocouple reports 80.0 °C. Standard uncertainty components are calibration 0.30 °C, repeatability 0.20 °C, and 0.10 °C display resolution treated as rectangular, giving 0.10/sqrt(12) = 0.0289 °C.

  1. 1

    Assuming independent components, u_c = sqrt(0.30² + 0.20² + 0.0289²) = 0.3617 °C.

  2. 2

    For teaching use k = 2, expanded uncertainty U = 0.723 °C.

  3. 3

    Report approximately 80.0 °C +/- 0.7 °C with the stated k and assumptions.

  4. 4

    Do not omit junction placement, thermal contact, response lag, cold-junction compensation, drift, or acquisition effects if significant.

Result. The numerical budget gives 0.362 °C standard uncertainty and about 0.72 °C expanded uncertainty under the simplified independent-component model.

Independent check. Components use a common standard-uncertainty basis, units match, correlations are considered, and rounding does not imply false precision.

Common misconceptions

MisconceptionCorrection
Calibration removes uncertaintyCalibration characterizes and corrects indication while contributing uncertainty; it does not reveal exact truth.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

What is missing from repeatability?

Systematic effects, calibration, environment, drift, mounting, resolution, and model assumptions may remain.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Recompute the uncertainty budget and identify the dominant listed contributor.

Intermediate

Add a 0.25 °C mounting-effect standard uncertainty and update the result.

Advanced

Design an experiment to quantify sensor response lag during a 500 W transient.

AI-assisted engineering task

Ask AI to organize a supplied uncertainty budget into source, distribution, divisor, sensitivity coefficient, and standard contribution. It may not invent values.

How to prove the AI output yourself

  1. Recompute every conversion.
  2. Check calibration certificate scope and date.
  3. Inspect raw repeats and environmental records.
  4. Review neglected contributors.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Measurand.

The quantity intended to be measured, defined with enough specificity to avoid ambiguity.

What role does Calibration play here?

Establishing the relation between instrument indication and reference values under stated conditions.

What must a reviewer be able to reconstruct?

Components use a common standard-uncertainty basis, units match, correlations are considered, and rounding does not imply false precision.

End-of-lesson summary

Define the measurand, measurement model, calibration, environmental influences, sampling, processing, and uncertainty contributors. Traceability means a documented calibration chain to references with stated uncertainty, not merely a serial number sticker.

Student notes

Write the measurand before writing the sensor model. If location or time is vague, the uncertainty number is premature.

Recommended readings

Instructor notes

State clearly that k = 2 is a teaching approximation, not a universal 95% guarantee without distribution and degrees-of-freedom analysis.

8.2

Experimental design, data cleaning, and data quality

Why this lesson matters

Cleaning can quietly remove the physics a model must explain. Experimental design and transparent transformations protect evidential value.

Learning objectives

  • Define and distinguish Experimental design and Replication.
  • Apply the lesson method to the worked experimental design, data cleaning, and data quality case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Plan tests around quantities of interest, factors, ranges, confounding, randomization, replication, controls, and uncertainty. Preserve raw data and implement cleaning as versioned, reviewable transformations with reasons and sensitivity checks.

Key concepts

Experimental designA planned arrangement of factors, levels, runs, controls, and measurements to answer defined questions.
ReplicationIndependent repetition used to estimate variability.
RandomizationRun-order or assignment strategy used to reduce systematic confounding.
Data-quality flagA non-destructive annotation describing validity, anomaly, saturation, dropout, or processing status.

Step-by-step explanation

  1. Define question, response quantity, factors, ranges, and nuisance variables.
  2. Choose run matrix, replication, randomization, controls, and calibration checks.
  3. Preserve immutable raw observations with timestamps and configuration.
  4. Apply cleaning through code that records rules, flags, and excluded points.
  5. Test conclusion sensitivity to plausible cleaning choices and report exclusions.

Worked example

A pressure transducer saturates at 10 bar during two pump-start transients. A cleaning script deletes all values equal to 10.0 before model comparison.

  1. 1

    Treat 10.0 bar as censored or saturated, not proven erroneous.

  2. 2

    Preserve raw points and flag saturation with sensor range and timestamps.

  3. 3

    Determine whether the validation quantity depends on the peak; if so, the experiment may be inadequate.

  4. 4

    Repeat with an appropriate range or add a second sensor rather than interpolating an unsupported peak.

Result. Transparent flags reveal an observability failure. Silent deletion would bias peak comparison and hide a test-design problem.

Independent check. Every altered or excluded point is reproducible from a rule, linked to raw data, and assessed for decision impact.

Common misconceptions

MisconceptionCorrection
Outliers should be removedUnusual observations may be errors, rare physics, transients, or model failures. Investigate and preserve before disposition.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

Why randomize?

To reduce correlation between treatment settings and time-varying nuisance factors.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Classify missing, saturated, duplicated, and out-of-range points without deleting them.

Intermediate

Design a run order that separates ambient drift from pump-speed effects.

Advanced

Compare model-validation conclusions under three defensible cleaning policies.

AI-assisted engineering task

Ask AI to propose anomaly categories and questions for a flagged dataset, while preserving original rows and abstaining from automatic deletion.

How to prove the AI output yourself

  1. Plot raw time histories.
  2. Inspect sensor limits and logs.
  3. Reproduce flags with code.
  4. Compare conclusions with and without disputed points.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Experimental design.

A planned arrangement of factors, levels, runs, controls, and measurements to answer defined questions.

What role does Replication play here?

Independent repetition used to estimate variability.

What must a reviewer be able to reconstruct?

Every altered or excluded point is reproducible from a rule, linked to raw data, and assessed for decision impact.

End-of-lesson summary

Plan tests around quantities of interest, factors, ranges, confounding, randomization, replication, controls, and uncertainty. Preserve raw data and implement cleaning as versioned, reviewable transformations with reasons and sensitivity checks.

Student notes

Keep raw, flagged, cleaned, and analysis-ready datasets as linked but distinct artifacts.

Recommended readings

Instructor notes

Include a physically real transient that looks like an outlier. Students must ask the instrument and experiment before asking the algorithm.

8.3

Comparing simulation with experiment

Why this lesson matters

Simulation and experiment can disagree because they answer subtly different questions, not only because one is inaccurate.

Learning objectives

  • Define and distinguish Alignment and Residual.
  • Apply the lesson method to the worked comparing simulation with experiment case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Align quantities, locations, time bases, boundary conditions, configurations, and uncertainty before computing comparison metrics. Use residual patterns and physical reasoning, not a single global score, to diagnose adequacy.

Key concepts

AlignmentMaking model and measurement quantities comparable in definition, condition, location, time, and configuration.
ResidualObserved minus predicted value under a declared sign convention.
Validation metricA quantitative comparison designed for the quantity, uncertainty, and intended inference.
Validation domainThe conditions and quantities over which validation evidence has been collected.

Step-by-step explanation

  1. Confirm configuration, boundary, initial condition, and quantity definitions.
  2. Transform coordinates, units, sample rates, and filtering using controlled methods.
  3. Propagate measurement and input uncertainty and estimate numerical uncertainty.
  4. Plot measured and predicted values plus residuals across conditions.
  5. Interpret magnitude and structure relative to context of use, then state the supported domain.

Worked example

At loads [0.5, 1.0, 1.5, 2.0] kN, measured bracket deflections are [0.51, 1.03, 1.58, 2.12] mm and simulation gives [0.49, 0.99, 1.49, 1.98] mm.

  1. 1

    Residual measured minus simulated is [0.02, 0.04, 0.09, 0.14] mm.

  2. 2

    RMSE = sqrt(mean(residual²)) = sqrt(0.007425) = 0.0862 mm.

  3. 3

    The increasing positive residual suggests load-dependent discrepancy, not random scatter alone.

  4. 4

    Check contact, joint slip, geometric nonlinearity, material response, fixture compliance, and measurement uncertainty.

Result. RMSE is about 0.086 mm, but the residual trend is the more informative clue for model-form or boundary-condition investigation.

Independent check. Sign convention, arithmetic, uncertainty, load alignment, fixture effects, and validation-domain statement are explicit.

Common misconceptions

MisconceptionCorrection
A high R² validates a modelCorrelation can be high despite systematic bias, wrong scale, or insufficient uncertainty and domain evidence.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

What does residual structure tell you?

Patterns can indicate missing physics, condition-dependent bias, timing errors, or correlated measurement effects.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Compute residuals and RMSE independently.

Intermediate

Plot residual versus load and propose three physics-based hypotheses.

Advanced

Design tests that distinguish fixture compliance from nonlinear bracket behavior.

AI-assisted engineering task

Ask AI to describe residual patterns and propose testable hypotheses, with no authority to choose the final model correction.

How to prove the AI output yourself

  1. Recalculate metrics.
  2. Inspect plots and raw data.
  3. Test hypotheses with targeted experiments or alternative models.
  4. Review relevance to the context of use.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Alignment.

Making model and measurement quantities comparable in definition, condition, location, time, and configuration.

What role does Residual play here?

Observed minus predicted value under a declared sign convention.

What must a reviewer be able to reconstruct?

Sign convention, arithmetic, uncertainty, load alignment, fixture effects, and validation-domain statement are explicit.

End-of-lesson summary

Align quantities, locations, time bases, boundary conditions, configurations, and uncertainty before computing comparison metrics. Use residual patterns and physical reasoning, not a single global score, to diagnose adequacy.

Student notes

Always report comparison metric, residual pattern, uncertainty, domain, and decision implication together.

Recommended readings

Instructor notes

Do not let RMSE end the discussion. Require a residual plot and one discriminating follow-up experiment.

8.4

Evidence grading and measurement traceability in the digital thread

Why this lesson matters

A test can be carefully executed yet weak for a decision because its configuration, range, uncertainty, independence, or traceability is inadequate.

Learning objectives

  • Define and distinguish Measurement traceability and Configuration fidelity.
  • Apply the lesson method to the worked evidence grading and measurement traceability in the digital thread case.
  • Evaluate evidence, uncertainty, and AI-assisted output before making a claim.

Readiness check

Before continuing, explain what decision this topic supports and name one upstream source that must be controlled.

Check your response

A sound answer names a specific engineering decision, its configuration, and a controlled requirement, model, dataset, interface, or standard that constrains the work.

Core idea

Grade measurement evidence on relevance, measurement quality, configuration fidelity, coverage, uncertainty, independence, and reproducibility. Link raw data, calibration, processing, result, comparison, review, and decision without compressing them into one unexplained score.

Key concepts

Measurement traceabilityA documented chain of calibrations to references, each contributing uncertainty.
Configuration fidelityHow closely the tested article and conditions represent the decision target.
Evidence independenceDegree to which evidence provides a genuinely separate challenge rather than reusing the same assumptions or data.
Evidence gradeA transparent multidimensional assessment for a stated decision.

Step-by-step explanation

  1. Identify the exact claim and target configuration.
  2. Trace measured values through instrument, calibration, acquisition, processing, and result.
  3. Assess range, resolution, uncertainty, repeatability, environment, and data completeness.
  4. Evaluate configuration fidelity, independence, and coverage of intended use.
  5. Record strengths, limitations, conflicts, and decision disposition by dimension.

Worked example

Two stiffness tests exist. Test A uses the production bracket but an uncalibrated displacement sensor. Test B uses a geometrically similar coupon with calibrated metrology and independent laboratory review.

  1. 1

    A has high configuration relevance but weak measurement traceability.

  2. 2

    B has strong measurement quality and independence but limited product-level fidelity.

  3. 3

    Do not average grades or select one universally. Use them as complementary evidence and identify the missing production-part calibrated test.

  4. 4

    State which claims each test can and cannot support.

Result. A multidimensional grade reveals complementary strengths and the decisive evidence gap better than a single confidence score.

Independent check. Every grade has documented rationale and the final claim does not exceed evidence scope.

Common misconceptions

MisconceptionCorrection
The most realistic test is automatically strongestPoor calibration, procedure, or uncertainty can undermine relevance; evidence has multiple dimensions.
A tool output closes the questionA result remains a candidate until its inputs, method, configuration, uncertainty, and relevance have been checked.

Diagnostic questions

Why keep raw data linked?

It enables reprocessing, audit, anomaly investigation, and verification of reported results.

What would make this work reproducible?

Controlled inputs, method or code, versions, assumptions, outputs, and a stated interpretation tied to the decision.

Practice ladder

Basic

Grade a test across relevance, quality, coverage, uncertainty, and independence.

Intermediate

Design a minimum evidence package that combines A and B responsibly.

Advanced

Resolve a case where independent high-quality evidence conflicts with a configuration-matched internal test.

AI-assisted engineering task

Ask AI to assemble an evidence inventory with candidate grades and quoted rationale, leaving final grading to reviewers.

How to prove the AI output yourself

  1. Open calibration and raw-data records.
  2. Reproduce processing.
  3. Check configuration and range.
  4. Use independent technical review for consequential claims.

Retrieval and spaced review

Answer closed-notes today, then again after 1, 3, 7, and 30 days.

Define Measurement traceability.

A documented chain of calibrations to references, each contributing uncertainty.

What role does Configuration fidelity play here?

How closely the tested article and conditions represent the decision target.

What must a reviewer be able to reconstruct?

Every grade has documented rationale and the final claim does not exceed evidence scope.

End-of-lesson summary

Grade measurement evidence on relevance, measurement quality, configuration fidelity, coverage, uncertainty, independence, and reproducibility. Link raw data, calibration, processing, result, comparison, review, and decision without compressing them into one unexplained score.

Student notes

Write one evidence paragraph per dimension, then a bounded decision implication.

Recommended readings

Instructor notes

Use radar-style dimensions only if each axis has criteria and rationale. Avoid decorative scoring.

LAB 8

Lab 8: Compare simulation and measurement data

Lab objective

Align datasets, compute residuals and RMSE, inspect trend, and produce an uncertainty-aware comparison table.

Engineering context

Use the four-load bracket dataset from Lesson 8.3 with 0.03 mm standard measurement uncertainty and 0.02 mm numerical uncertainty.

Input data

  • Load, measured deflection, simulated deflection
  • Standard measurement and numerical uncertainty

Step-by-step task

  1. Compute residuals
  2. Compute RMSE
  3. Combine independent standard uncertainties as a teaching assumption
  4. Flag residuals exceeding two combined standard uncertainties

Python code

import numpy as np

load = np.array([0.5, 1.0, 1.5, 2.0])
measured = np.array([0.51, 1.03, 1.58, 2.12])
simulated = np.array([0.49, 0.99, 1.49, 1.98])
residual = measured - simulated
rmse = np.sqrt(np.mean(residual**2))
u_measurement, u_numerical = 0.03, 0.02
u_combined = np.sqrt(u_measurement**2 + u_numerical**2)
flags = np.abs(residual) > 2.0 * u_combined
for row in zip(load, measured, simulated, residual, flags):
    print(f"load={row[0]:.1f} kN measured={row[1]:.2f} sim={row[2]:.2f} "
          f"residual={row[3]:+.2f} flag={row[4]}")
print(f"RMSE={rmse:.4f} mm, combined u={u_combined:.4f} mm")

Explanation of code

Step 1 compute residuals Step 2 compute RMSE Step 3 combine independent standard uncertainties as a teaching assumption Step 4 flag residuals exceeding two combined standard uncertainties

Expected output

Residuals [0.02, 0.04, 0.09, 0.14] mm, RMSE about 0.0862 mm, combined standard uncertainty about 0.0361 mm, and flags at the two highest loads.

Interpretation

The two-sigma rule is a screening heuristic under simplified assumptions, not a universal validation acceptance criterion.

Common errors

  • Ignoring uncertainty correlation
  • Using RMSE without residual plots
  • Treating flagged points as bad data

Extension tasks

  • Plot residuals with uncertainty bars
  • Fit a physically motivated compliance correction
  • Use bootstrap or repeated-test data

Reflection questions

  • What pattern appears?
  • Which assumptions underlie the combined uncertainty?
  • What experiment would distinguish competing explanations?
WEEK 8

Weekly quiz and concept check

Closed notes. Answer each item, then use the key to correct in a different color.

  1. What is the measurand?
  2. Does calibration reveal exact truth?
  3. Why preserve raw data?
  4. What must be aligned before simulation-test comparison?
  5. Why inspect residual patterns?
  6. What dimensions can grade evidence?
Answer key
  1. 1. The specifically defined quantity intended to be measured.
  2. 2. No. It establishes a relation to references with stated conditions and uncertainty.
  3. 3. To reproduce transformations, investigate anomalies, and audit conclusions.
  4. 4. Quantity, location, time, condition, configuration, units, and processing.
  5. 5. They can reveal condition-dependent discrepancy that a global score hides.
  6. 6. Relevance, quality, configuration fidelity, coverage, uncertainty, independence, and reproducibility.
SOURCES

Module source map

SourceHow it is used
NASA Systems Engineering Handbook, NASA/SP-2016-6105 Rev. 2Lifecycle processes, requirements, interfaces, technical decisions, reviews, verification, and validation.
NASA-STD-7009, Standard for Models and SimulationsModel and simulation lifecycle, credibility products, acceptance criteria, and reporting. NASA-STD-7009B supersedes 7009A.
Oberkampf and Roy, Verification and Validation in Scientific ComputingVerification, validation, numerical error, uncertainty, prediction, and simulation credibility.
FDA Guidance on Computational Modeling and Simulation CredibilityRisk-informed credibility assessment and transparent reporting of computational evidence.

Access labels and full-course source notes are on the course home page. Paywalled standards are not paraphrased as if their full text were accessed.