This report summarizes the RoRD updates added on October 20, 2025: high-fidelity augmentation, procedural and diffusion synthesis, three-source mixed training, one-command pipeline with config write-back, and efficient ResNet34 + FPN inference.
Date: October 20, 2025
Type: technical report
Keywords: ElasticTransform, H consistency, procedural synthesis, diffusion synthesis, FPN inference, ResNet34.
This stage adds three major capabilities:
Benchmark results show that ResNet34 is stable and efficient on both CPU and GPU. On GPU, FPN adds low overhead, about +18% on the A100 example. Overall, the FPN path reaches the target of at least 30% speedup and at least 20% memory saving over sliding windows.
Recommended defaults:
| Module | Addition | Problem solved | Main advantage | Cost / risk |
|---|---|---|---|---|
| Data augmentation | ElasticTransform with H consistency | Insufficient robustness to non-rigid perturbation | Better generalization and more stable convergence | Some CPU overhead; clipping tolerance needed |
| Synthetic data | Procedural GDS generation + KLayout / GDSTK rasterization + preview / H validation | Data scarcity, insufficient style diversity, expensive labels | Controllable diversity, reproducibility, easy quality check | Requires KLayout; fallback needed |
| Training strategy | Real × procedural × diffusion mixed sampling; real-only validation | Domain shift and overfitting | Controllable ratios and traceable experiments | Bad ratios may introduce bias |
| Diffusion integration | synthetic.diffusion config and three-script skeleton | Research-oriented style expansion path | Gradual integration with controlled risk | Training and sampling still need implementation |
| Tooling | One-command pipeline, diffusion directory support, TensorBoard export | Lower cost and stronger reproducibility | YAML auto update and standardized flow | Directory conventions must be followed |
configs/base_config.yaml adds synthetic.diffusion.{enabled,png_dir,ratio}.train.py uses ConcatDataset + WeightedRandomSampler for three-source mixed sampling; target ratio is real = 1 - (syn + diff); validation uses real data only.tools/synth_pipeline.py adds --diffusion_dir, writes back YAML automatically, and enables the diffusion node with ratio defaulting to 0.0.tools/layout2png.py prefers KLayout batch rendering and supports --layermap, --line_width, and --bgcolor; without KLayout, it falls back to GDSTK + SVG + CairoSVG.tools/preview_dataset.py creates preview mosaics; tools/validate_h_consistency.py compares warp consistency with MSE / PSNR and visualization.tools/diffusion/{prepare_patch_dataset.py, train_layout_diffusion.py, sample_layouts.py} currently provides CLI scaffolding and TODOs.| Backbone | Single Mean ± Std (ms) | FPN Mean ± Std (ms) | Interpretation |
|---|---|---|---|
| VGG16 | 392.03 ± 4.76 | 821.91 ± 4.17 | Slowest; FPN overhead is amplified on CPU. |
| ResNet34 | 105.01 ± 1.57 | 131.17 ± 1.66 | Best overall balance; FPN is practical. |
| EfficientNet-B0 | 62.02 ± 2.64 | 161.71 ± 1.58 | Fastest single-scale; high relative FPN overhead. |
| Attention | Single Mean ± Std (ms) | FPN Mean ± Std (ms) | Interpretation |
|---|---|---|---|
| none | 97.57 ± 0.55 | 124.57 ± 0.48 | Baseline. |
| SE | 101.48 ± 2.13 | 123.12 ± 0.50 | Slight single-scale overhead; little FPN difference. |
| CBAM | 119.80 ± 2.38 | 123.11 ± 0.71 | More sensitive in single-scale; tiny FPN difference. |
| Backbone | Single Mean (ms) | FPN Mean (ms) | Interpretation |
|---|---|---|---|
| ResNet34 | 2.32 | 2.73 | Best combination; FPN only +18%. |
| VGG16 | 4.53 | 8.51 | Clearly slower. |
| EfficientNet-B0 | 3.69 | 4.38 | Middle range. |
Full reproduction commands and broader experiment summaries can later be linked to docs/description/Performance_Benchmark.md.
| Backbone | Attention | Single Mean ± Std (ms) | FPN Mean ± Std (ms) |
|---|---|---|---|
| vgg16 | none | 351.65 ± 1.88 | 719.33 ± 3.95 |
| vgg16 | se | 349.76 ± 2.00 | 721.41 ± 2.74 |
| vgg16 | cbam | 354.45 ± 1.49 | 744.76 ± 29.32 |
| resnet34 | none | 90.99 ± 0.41 | 117.22 ± 0.41 |
| resnet34 | se | 90.78 ± 0.47 | 115.91 ± 1.31 |
| resnet34 | cbam | 96.50 ± 3.17 | 111.09 ± 1.01 |
| efficientnet_b0 | none | 40.45 ± 1.53 | 127.30 ± 0.09 |
| efficientnet_b0 | se | 46.48 ± 0.26 | 142.35 ± 6.61 |
| efficientnet_b0 | cbam | 47.11 ± 0.47 | 150.99 ± 12.47 |
Key point: ResNet34 offers a robust speed and FPN-overhead balance on CPU. EfficientNet-B0 is fast in single-scale mode but pays a high FPN cost.
| Backbone | Attention | Single Mean ± Std (ms) | FPN Mean ± Std (ms) |
|---|---|---|---|
| vgg16 | none | 4.53 ± 0.02 | 8.51 ± 0.002 |
| vgg16 | se | 3.80 ± 0.01 | 7.12 ± 0.004 |
| vgg16 | cbam | 3.73 ± 0.02 | 6.95 ± 0.09 |
| resnet34 | none | 2.32 ± 0.04 | 2.73 ± 0.007 |
| resnet34 | se | 2.33 ± 0.01 | 2.73 ± 0.004 |
| resnet34 | cbam | 2.46 ± 0.04 | 2.74 ± 0.004 |
| efficientnet_b0 | none | 3.69 ± 0.07 | 4.38 ± 0.02 |
| efficientnet_b0 | se | 3.76 ± 0.06 | 4.37 ± 0.03 |
| efficientnet_b0 | cbam | 3.99 ± 0.08 | 4.41 ± 0.02 |
Key point: attention has little runtime impact on GPU. ResNet34 remains the best choice for both single-scale and FPN inference.
Speedup:
Memory saving:
Accuracy guard:
{
"timestamp": "2025-10-20 14:30:45",
"config": "configs/base_config.yaml",
"model_path": "path/to/model_final.pth",
"layout_path": "test_data/layout.png",
"template_path": "test_data/template.png",
"device": "cuda:0",
"fpn": {
"method": "FPN",
"mean_time_ms": 245.32,
"std_time_ms": 12.45,
"gpu_memory_mb": 1024.5,
"num_runs": 5
},
"sliding_window": {
"method": "Sliding Window",
"mean_time_ms": 352.18,
"std_time_ms": 18.67
},
"comparison": {
"speedup_percent": 30.35,
"memory_saving_percent": 21.14,
"fpn_faster": true,
"meets_speedup_target": true,
"meets_memory_target": true
}
}
PYTHONPATH=. uv run python tests/benchmark_attention.py \
--device cpu --image-size 512 --runs 10 \
--backbone resnet34 --places backbone_high desc_head
PYTHONPATH=. uv run python tests/benchmark_grid.py \
--device cpu --image-size 512 --runs 3 \
--backbones vgg16 resnet34 efficientnet_b0 \
--attentions none se cbam \
--places backbone_high desc_head
PYTHONPATH=. uv run python tests/benchmark_grid.py \
--device cuda --image-size 512 --runs 5 \
--backbones vgg16 resnet34 efficientnet_b0 \
--attentions none se cbam \
--places backbone_high
| Impact | Description |
|---|---|
| More stable convergence | Elastic + procedural synthesis makes the model more robust to non-rigid perturbations. |
| Better generalization | Wider style and structure diversity reduces overfitting. |
| Better engineering reproducibility | One-command pipeline, config write-back, and TensorBoard export standardize the workflow. |
| Better inference economics | FPN provides at least 30% speedup and at least 20% memory saving over sliding windows. |
uv run python tools/synth_pipeline.py \
--out_root data/synthetic \
--num 200 --dpi 600 \
--config configs/base_config.yaml \
--ratio 0.3 \
--diffusion_dir data/synthetic_diff/png
synthetic:
enabled: true
png_dir: data/synthetic/png
ratio: 0.3
diffusion:
enabled: true
png_dir: data/synthetic_diff/png
ratio: 0.1
augment:
elastic:
enabled: true
alpha: 40
sigma: 6
alpha_affine: 6
prob: 0.3
The main value of this stage is moving RoRD from “runnable” toward “trainable, reproducible, extensible, and measurable.” On the data side, ElasticTransform, procedural synthesis, and diffusion hooks are added. On the training side, three-source mixed sampling is wired in. On the inference side, ResNet34 + FPN is confirmed as an efficient combination, laying the foundation for real-data training and paper experiments.
Loading comments...