RoRD Implementation And Benchmark Report

Oct 20, 2025

1002 words

5 min read

RoRD Implementation And Benchmark Report

This report summarizes the RoRD updates added on October 20, 2025: high-fidelity augmentation, procedural and diffusion synthesis, three-source mixed training, one-command pipeline with config write-back, and efficient ResNet34 + FPN inference.

Report Information

Date: October 20, 2025
Type: technical report
Keywords: ElasticTransform, H consistency, procedural synthesis, diffusion synthesis, FPN inference, ResNet34.

Executive Summary

This stage adds three major capabilities:

ElasticTransform while preserving H consistency.
Procedural synthesis and one-command pipeline covering GDS -> PNG -> quality check -> config write-back.
Three-source mixed training: real data, procedural synthesis, and diffusion synthesis; validation uses real data only.

Benchmark results show that ResNet34 is stable and efficient on both CPU and GPU. On GPU, FPN adds low overhead, about +18% on the A100 example. Overall, the FPN path reaches the target of at least 30% speedup and at least 20% memory saving over sliding windows.

Recommended defaults:

GPU inference: ResNet34 + FPN.
Procedural synthesis ratio: 0.2 to 0.3.
Diffusion synthesis ratio: start from 0.1.
Elastic parameters: $\alpha = 40$ , $\sigma = 6$ .
Rendering DPI: 600 to 900.
Rendering tool: prefer KLayout.

What Was Added And Why

Module	Addition	Problem solved	Main advantage	Cost / risk
Data augmentation	ElasticTransform with H consistency	Insufficient robustness to non-rigid perturbation	Better generalization and more stable convergence	Some CPU overhead; clipping tolerance needed
Synthetic data	Procedural GDS generation + KLayout / GDSTK rasterization + preview / H validation	Data scarcity, insufficient style diversity, expensive labels	Controllable diversity, reproducibility, easy quality check	Requires KLayout; fallback needed
Training strategy	Real × procedural × diffusion mixed sampling; real-only validation	Domain shift and overfitting	Controllable ratios and traceable experiments	Bad ratios may introduce bias
Diffusion integration	`synthetic.diffusion` config and three-script skeleton	Research-oriented style expansion path	Gradual integration with controlled risk	Training and sampling still need implementation
Tooling	One-command pipeline, diffusion directory support, TensorBoard export	Lower cost and stronger reproducibility	YAML auto update and standardized flow	Directory conventions must be followed

Implementation Highlights

Config: configs/base_config.yaml adds synthetic.diffusion.{enabled,png_dir,ratio}.
Training: train.py uses ConcatDataset + WeightedRandomSampler for three-source mixed sampling; target ratio is real = 1 - (syn + diff); validation uses real data only.
Pipeline: tools/synth_pipeline.py adds --diffusion_dir, writes back YAML automatically, and enables the diffusion node with ratio defaulting to 0.0.
Rendering: tools/layout2png.py prefers KLayout batch rendering and supports --layermap, --line_width, and --bgcolor; without KLayout, it falls back to GDSTK + SVG + CairoSVG.
Quality checks: tools/preview_dataset.py creates preview mosaics; tools/validate_h_consistency.py compares warp consistency with MSE / PSNR and visualization.
Diffusion skeleton: tools/diffusion/{prepare_patch_dataset.py, train_layout_diffusion.py, sample_layouts.py} currently provides CLI scaffolding and TODOs.

Benchmarks And Insights

CPU Forward: 512×512, runs = 5

Backbone	Single Mean ± Std (ms)	FPN Mean ± Std (ms)	Interpretation
VGG16	392.03 ± 4.76	821.91 ± 4.17	Slowest; FPN overhead is amplified on CPU.
ResNet34	105.01 ± 1.57	131.17 ± 1.66	Best overall balance; FPN is practical.
EfficientNet-B0	62.02 ± 2.64	161.71 ± 1.58	Fastest single-scale; high relative FPN overhead.

Attention A/B: CPU, ResNet34, 512×512, runs = 10

Attention	Single Mean ± Std (ms)	FPN Mean ± Std (ms)	Interpretation
none	97.57 ± 0.55	124.57 ± 0.48	Baseline.
SE	101.48 ± 2.13	123.12 ± 0.50	Slight single-scale overhead; little FPN difference.
CBAM	119.80 ± 2.38	123.11 ± 0.71	More sensitive in single-scale; tiny FPN difference.

GPU Example: A100, 512×512, runs = 5

Backbone	Single Mean (ms)	FPN Mean (ms)	Interpretation
ResNet34	2.32	2.73	Best combination; FPN only +18%.
VGG16	4.53	8.51	Clearly slower.
EfficientNet-B0	3.69	4.38	Middle range.

Full reproduction commands and broader experiment summaries can later be linked to docs/description/Performance_Benchmark.md.

3D Benchmark: Backbone × Attention × Single/FPN, CPU, 512×512, runs = 3

Backbone	Attention	Single Mean ± Std (ms)	FPN Mean ± Std (ms)
vgg16	none	351.65 ± 1.88	719.33 ± 3.95
vgg16	se	349.76 ± 2.00	721.41 ± 2.74
vgg16	cbam	354.45 ± 1.49	744.76 ± 29.32
resnet34	none	90.99 ± 0.41	117.22 ± 0.41
resnet34	se	90.78 ± 0.47	115.91 ± 1.31
resnet34	cbam	96.50 ± 3.17	111.09 ± 1.01
efficientnet_b0	none	40.45 ± 1.53	127.30 ± 0.09
efficientnet_b0	se	46.48 ± 0.26	142.35 ± 6.61
efficientnet_b0	cbam	47.11 ± 0.47	150.99 ± 12.47

Key point: ResNet34 offers a robust speed and FPN-overhead balance on CPU. EfficientNet-B0 is fast in single-scale mode but pays a high FPN cost.

GPU Breakdown: Attention Included, A100, 512×512, runs = 5

Backbone	Attention	Single Mean ± Std (ms)	FPN Mean ± Std (ms)
vgg16	none	4.53 ± 0.02	8.51 ± 0.002
vgg16	se	3.80 ± 0.01	7.12 ± 0.004
vgg16	cbam	3.73 ± 0.02	6.95 ± 0.09
resnet34	none	2.32 ± 0.04	2.73 ± 0.007
resnet34	se	2.33 ± 0.01	2.73 ± 0.004
resnet34	cbam	2.46 ± 0.04	2.74 ± 0.004
efficientnet_b0	none	3.69 ± 0.07	4.38 ± 0.02
efficientnet_b0	se	3.76 ± 0.06	4.37 ± 0.03
efficientnet_b0	cbam	3.99 ± 0.08	4.41 ± 0.02

Key point: attention has little runtime impact on GPU. ResNet34 remains the best choice for both single-scale and FPN inference.

Methodology Notes

Speedup:

\frac{\text{SW}_{time} - \text{FPN}_{time}}{\text{SW}_{time}} \times 100\%

Memory saving:

\frac{\text{SW}_{mem} - \text{FPN}_{mem}}{\text{SW}_{mem}} \times 100\%

Accuracy guard:

\text{FPN}_{matches} \geq \text{SW}_{matches} \times 0.95

JSON Structure Example

{
  "timestamp": "2025-10-20 14:30:45",
  "config": "configs/base_config.yaml",
  "model_path": "path/to/model_final.pth",
  "layout_path": "test_data/layout.png",
  "template_path": "test_data/template.png",
  "device": "cuda:0",
  "fpn": {
    "method": "FPN",
    "mean_time_ms": 245.32,
    "std_time_ms": 12.45,
    "gpu_memory_mb": 1024.5,
    "num_runs": 5
  },
  "sliding_window": {
    "method": "Sliding Window",
    "mean_time_ms": 352.18,
    "std_time_ms": 18.67
  },
  "comparison": {
    "speedup_percent": 30.35,
    "memory_saving_percent": 21.14,
    "fpn_faster": true,
    "meets_speedup_target": true,
    "meets_memory_target": true
  }
}

Reproduction Commands

PYTHONPATH=. uv run python tests/benchmark_attention.py \
  --device cpu --image-size 512 --runs 10 \
  --backbone resnet34 --places backbone_high desc_head

PYTHONPATH=. uv run python tests/benchmark_grid.py \
  --device cpu --image-size 512 --runs 3 \
  --backbones vgg16 resnet34 efficientnet_b0 \
  --attentions none se cbam \
  --places backbone_high desc_head

PYTHONPATH=. uv run python tests/benchmark_grid.py \
  --device cuda --image-size 512 --runs 5 \
  --backbones vgg16 resnet34 efficientnet_b0 \
  --attentions none se cbam \
  --places backbone_high

Data And Training Recommendations

Rendering: DPI 600 to 900; prefer KLayout; fall back to GDSTK + SVG when needed.
Elastic parameters: $\alpha=40$ , $\sigma=6$ , $\alpha_{affine}=6$ , $p=0.3$ ; use H-consistency visualization for spot checks.
Mixed sampling ratio: procedural synthesis ratio = 0.2 to 0.3; diffusion synthesis ratio starts at 0.1; first run structural statistics such as edge direction, connected components, line-width distribution, and density histogram.
Validation: use real data only to prevent style gaps from affecting evaluation.
Inference: default to ResNet34 + FPN on GPU; for small CPU tasks, evaluate single-scale mode with tighter NMS.

Impact Registry

Impact	Description
More stable convergence	Elastic + procedural synthesis makes the model more robust to non-rigid perturbations.
Better generalization	Wider style and structure diversity reduces overfitting.
Better engineering reproducibility	One-command pipeline, config write-back, and TensorBoard export standardize the workflow.
Better inference economics	FPN provides at least 30% speedup and at least 20% memory saving over sliding windows.

Appendix

One-Command Pipeline With Diffusion Directory

uv run python tools/synth_pipeline.py \
  --out_root data/synthetic \
  --num 200 --dpi 600 \
  --config configs/base_config.yaml \
  --ratio 0.3 \
  --diffusion_dir data/synthetic_diff/png

Suggested YAML Snippet

synthetic:
  enabled: true
  png_dir: data/synthetic/png
  ratio: 0.3
  diffusion:
    enabled: true
    png_dir: data/synthetic_diff/png
    ratio: 0.1
augment:
  elastic:
    enabled: true
    alpha: 40
    sigma: 6
    alpha_affine: 6
    prob: 0.3

Summary

The main value of this stage is moving RoRD from “runnable” toward “trainable, reproducible, extensible, and measurable.” On the data side, ElasticTransform, procedural synthesis, and diffusion hooks are added. On the training side, three-source mixed sampling is wired in. On the inference side, ResNet34 + FPN is confirmed as an efficient combination, laying the foundation for real-data training and paper experiments.

RoRD Implementation And Benchmark Report

https://www.jiao77.com/en/blog/report/rord-implementation-benchmark-2025-10-20/

Author

Jiao77

Published on

Oct 20, 2025

License

CC BY-NC-SA 4.0

Loading comments...

Contents

Executive Summary
What Was Added And Why
Implementation Highlights
Benchmarks And Insights
CPU Forward: 512×512, runs = 5
Attention A/B: CPU, ResNet34, 512×512, runs = 10
GPU Example: A100, 512×512, runs = 5
3D Benchmark: Backbone × Attention × Single/FPN, CPU, 512×512, runs = 3
GPU Breakdown: Attention Included, A100, 512×512, runs = 5
Methodology Notes
JSON Structure Example
Reproduction Commands
Data And Training Recommendations
Impact Registry
Appendix
One-Command Pipeline With Diffusion Directory
Suggested YAML Snippet
Summary