RoRD Implementation And Benchmark Report

Oct 20, 2025

1002 words

5 min read

RoRD

RoRD Implementation And Benchmark Report

This report summarizes the RoRD updates added on October 20, 2025: high-fidelity augmentation, procedural and diffusion synthesis, three-source mixed training, one-command pipeline with config write-back, and efficient ResNet34 + FPN inference.

Report Information

Date: October 20, 2025
Type: technical report
Keywords: ElasticTransform, H consistency, procedural synthesis, diffusion synthesis, FPN inference, ResNet34.

Executive Summary

This stage adds three major capabilities:

  1. ElasticTransform while preserving H consistency.
  2. Procedural synthesis and one-command pipeline covering GDS -> PNG -> quality check -> config write-back.
  3. Three-source mixed training: real data, procedural synthesis, and diffusion synthesis; validation uses real data only.

Benchmark results show that ResNet34 is stable and efficient on both CPU and GPU. On GPU, FPN adds low overhead, about +18% on the A100 example. Overall, the FPN path reaches the target of at least 30% speedup and at least 20% memory saving over sliding windows.

Recommended defaults:

  1. GPU inference: ResNet34 + FPN.
  2. Procedural synthesis ratio: 0.2 to 0.3.
  3. Diffusion synthesis ratio: start from 0.1.
  4. Elastic parameters: α=40\alpha = 40, σ=6\sigma = 6.
  5. Rendering DPI: 600 to 900.
  6. Rendering tool: prefer KLayout.

What Was Added And Why

ModuleAdditionProblem solvedMain advantageCost / risk
Data augmentationElasticTransform with H consistencyInsufficient robustness to non-rigid perturbationBetter generalization and more stable convergenceSome CPU overhead; clipping tolerance needed
Synthetic dataProcedural GDS generation + KLayout / GDSTK rasterization + preview / H validationData scarcity, insufficient style diversity, expensive labelsControllable diversity, reproducibility, easy quality checkRequires KLayout; fallback needed
Training strategyReal × procedural × diffusion mixed sampling; real-only validationDomain shift and overfittingControllable ratios and traceable experimentsBad ratios may introduce bias
Diffusion integrationsynthetic.diffusion config and three-script skeletonResearch-oriented style expansion pathGradual integration with controlled riskTraining and sampling still need implementation
ToolingOne-command pipeline, diffusion directory support, TensorBoard exportLower cost and stronger reproducibilityYAML auto update and standardized flowDirectory conventions must be followed

Implementation Highlights

  1. Config: configs/base_config.yaml adds synthetic.diffusion.{enabled,png_dir,ratio}.
  2. Training: train.py uses ConcatDataset + WeightedRandomSampler for three-source mixed sampling; target ratio is real = 1 - (syn + diff); validation uses real data only.
  3. Pipeline: tools/synth_pipeline.py adds --diffusion_dir, writes back YAML automatically, and enables the diffusion node with ratio defaulting to 0.0.
  4. Rendering: tools/layout2png.py prefers KLayout batch rendering and supports --layermap, --line_width, and --bgcolor; without KLayout, it falls back to GDSTK + SVG + CairoSVG.
  5. Quality checks: tools/preview_dataset.py creates preview mosaics; tools/validate_h_consistency.py compares warp consistency with MSE / PSNR and visualization.
  6. Diffusion skeleton: tools/diffusion/{prepare_patch_dataset.py, train_layout_diffusion.py, sample_layouts.py} currently provides CLI scaffolding and TODOs.

Benchmarks And Insights

CPU Forward: 512×512, runs = 5

BackboneSingle Mean ± Std (ms)FPN Mean ± Std (ms)Interpretation
VGG16392.03 ± 4.76821.91 ± 4.17Slowest; FPN overhead is amplified on CPU.
ResNet34105.01 ± 1.57131.17 ± 1.66Best overall balance; FPN is practical.
EfficientNet-B062.02 ± 2.64161.71 ± 1.58Fastest single-scale; high relative FPN overhead.

Attention A/B: CPU, ResNet34, 512×512, runs = 10

AttentionSingle Mean ± Std (ms)FPN Mean ± Std (ms)Interpretation
none97.57 ± 0.55124.57 ± 0.48Baseline.
SE101.48 ± 2.13123.12 ± 0.50Slight single-scale overhead; little FPN difference.
CBAM119.80 ± 2.38123.11 ± 0.71More sensitive in single-scale; tiny FPN difference.

GPU Example: A100, 512×512, runs = 5

BackboneSingle Mean (ms)FPN Mean (ms)Interpretation
ResNet342.322.73Best combination; FPN only +18%.
VGG164.538.51Clearly slower.
EfficientNet-B03.694.38Middle range.

Full reproduction commands and broader experiment summaries can later be linked to docs/description/Performance_Benchmark.md.

3D Benchmark: Backbone × Attention × Single/FPN, CPU, 512×512, runs = 3

BackboneAttentionSingle Mean ± Std (ms)FPN Mean ± Std (ms)
vgg16none351.65 ± 1.88719.33 ± 3.95
vgg16se349.76 ± 2.00721.41 ± 2.74
vgg16cbam354.45 ± 1.49744.76 ± 29.32
resnet34none90.99 ± 0.41117.22 ± 0.41
resnet34se90.78 ± 0.47115.91 ± 1.31
resnet34cbam96.50 ± 3.17111.09 ± 1.01
efficientnet_b0none40.45 ± 1.53127.30 ± 0.09
efficientnet_b0se46.48 ± 0.26142.35 ± 6.61
efficientnet_b0cbam47.11 ± 0.47150.99 ± 12.47

Key point: ResNet34 offers a robust speed and FPN-overhead balance on CPU. EfficientNet-B0 is fast in single-scale mode but pays a high FPN cost.

GPU Breakdown: Attention Included, A100, 512×512, runs = 5

BackboneAttentionSingle Mean ± Std (ms)FPN Mean ± Std (ms)
vgg16none4.53 ± 0.028.51 ± 0.002
vgg16se3.80 ± 0.017.12 ± 0.004
vgg16cbam3.73 ± 0.026.95 ± 0.09
resnet34none2.32 ± 0.042.73 ± 0.007
resnet34se2.33 ± 0.012.73 ± 0.004
resnet34cbam2.46 ± 0.042.74 ± 0.004
efficientnet_b0none3.69 ± 0.074.38 ± 0.02
efficientnet_b0se3.76 ± 0.064.37 ± 0.03
efficientnet_b0cbam3.99 ± 0.084.41 ± 0.02

Key point: attention has little runtime impact on GPU. ResNet34 remains the best choice for both single-scale and FPN inference.

Methodology Notes

Speedup:

SWtimeFPNtimeSWtime×100%\frac{\text{SW}_{time} - \text{FPN}_{time}}{\text{SW}_{time}} \times 100\%

Memory saving:

SWmemFPNmemSWmem×100%\frac{\text{SW}_{mem} - \text{FPN}_{mem}}{\text{SW}_{mem}} \times 100\%

Accuracy guard:

FPNmatchesSWmatches×0.95\text{FPN}_{matches} \geq \text{SW}_{matches} \times 0.95

JSON Structure Example

{
  "timestamp": "2025-10-20 14:30:45",
  "config": "configs/base_config.yaml",
  "model_path": "path/to/model_final.pth",
  "layout_path": "test_data/layout.png",
  "template_path": "test_data/template.png",
  "device": "cuda:0",
  "fpn": {
    "method": "FPN",
    "mean_time_ms": 245.32,
    "std_time_ms": 12.45,
    "gpu_memory_mb": 1024.5,
    "num_runs": 5
  },
  "sliding_window": {
    "method": "Sliding Window",
    "mean_time_ms": 352.18,
    "std_time_ms": 18.67
  },
  "comparison": {
    "speedup_percent": 30.35,
    "memory_saving_percent": 21.14,
    "fpn_faster": true,
    "meets_speedup_target": true,
    "meets_memory_target": true
  }
}

Reproduction Commands

PYTHONPATH=. uv run python tests/benchmark_attention.py \
  --device cpu --image-size 512 --runs 10 \
  --backbone resnet34 --places backbone_high desc_head
PYTHONPATH=. uv run python tests/benchmark_grid.py \
  --device cpu --image-size 512 --runs 3 \
  --backbones vgg16 resnet34 efficientnet_b0 \
  --attentions none se cbam \
  --places backbone_high desc_head
PYTHONPATH=. uv run python tests/benchmark_grid.py \
  --device cuda --image-size 512 --runs 5 \
  --backbones vgg16 resnet34 efficientnet_b0 \
  --attentions none se cbam \
  --places backbone_high

Data And Training Recommendations

  1. Rendering: DPI 600 to 900; prefer KLayout; fall back to GDSTK + SVG when needed.
  2. Elastic parameters: α=40\alpha=40, σ=6\sigma=6, αaffine=6\alpha_{affine}=6, p=0.3p=0.3; use H-consistency visualization for spot checks.
  3. Mixed sampling ratio: procedural synthesis ratio = 0.2 to 0.3; diffusion synthesis ratio starts at 0.1; first run structural statistics such as edge direction, connected components, line-width distribution, and density histogram.
  4. Validation: use real data only to prevent style gaps from affecting evaluation.
  5. Inference: default to ResNet34 + FPN on GPU; for small CPU tasks, evaluate single-scale mode with tighter NMS.

Impact Registry

ImpactDescription
More stable convergenceElastic + procedural synthesis makes the model more robust to non-rigid perturbations.
Better generalizationWider style and structure diversity reduces overfitting.
Better engineering reproducibilityOne-command pipeline, config write-back, and TensorBoard export standardize the workflow.
Better inference economicsFPN provides at least 30% speedup and at least 20% memory saving over sliding windows.

Appendix

One-Command Pipeline With Diffusion Directory

uv run python tools/synth_pipeline.py \
  --out_root data/synthetic \
  --num 200 --dpi 600 \
  --config configs/base_config.yaml \
  --ratio 0.3 \
  --diffusion_dir data/synthetic_diff/png

Suggested YAML Snippet

synthetic:
  enabled: true
  png_dir: data/synthetic/png
  ratio: 0.3
  diffusion:
    enabled: true
    png_dir: data/synthetic_diff/png
    ratio: 0.1
augment:
  elastic:
    enabled: true
    alpha: 40
    sigma: 6
    alpha_affine: 6
    prob: 0.3

Summary

The main value of this stage is moving RoRD from “runnable” toward “trainable, reproducible, extensible, and measurable.” On the data side, ElasticTransform, procedural synthesis, and diffusion hooks are added. On the training side, three-source mixed sampling is wired in. On the inference side, ResNet34 + FPN is confirmed as an efficient combination, laying the foundation for real-data training and paper experiments.

RoRD Implementation And Benchmark Report
https://www.jiao77.com/en/blog/report/rord-implementation-benchmark-2025-10-20/
Author
Jiao77
Published on
Oct 20, 2025
License
CC BY-NC-SA 4.0

Loading comments...

Enter keywords to start searching