RoRD-Layout-Recognation Comprehensive Technical Report

Oct 5, 2025

1437 words

7 min read

RoRD-Layout-Recognation Comprehensive Technical Report

This report summarizes the full development path of RoRD-Layout-Recognation, from topic selection and model adaptation to engineering implementation and publication planning. The project brings rotation-robust local features into IC layout recognition, with the goal of building a high-accuracy, low-latency, zero/few-shot-friendly template matching system.

Report Info

Version: 1.3, complete edition
Date: October 5, 2025
Type: Comprehensive technical report
Keywords: RoRD, IC layout recognition, FPN, geometry-aware loss, self-supervised training, DTCO.

Project Overview

RoRD-Layout-Recognation adapts RoRD, or Rotation-Robust Descriptors, into an IC layout template recognition system. The target is to support design-technology co-optimization (DTCO) with automated, robust, and reusable layout analysis capabilities.

Project Goals

High-accuracy template matching: locate all template instances in complex layouts, including eight orientations and scale variations.
Efficient inference: enable near-real-time matching for large GDSII-derived layout images.
Zero/few-shot generalization: support new templates without relying on large annotated datasets.
Standardized research workflow: build a reproducible pipeline for data preparation, training, tuning, and evaluation.

Main Challenges

Data scarcity: high-quality labeled layout data is expensive and difficult to obtain.
Geometric variation: templates may appear under rotations and mirrors, requiring robust descriptors.
Structural complexity: IC layouts contain Manhattan grids, binary sparse signals, and repeated structures that differ sharply from natural images.
Fast evolution: process nodes and IP libraries evolve continuously, so the model must adapt quickly.

Project Journey And Current State

Phase	Period	Focus	Key Outcomes
Phase 1: Proposal and technology selection	June 2025	Define the DTCO-oriented target and evaluate the RoRD route.	Compared U-Net, YOLO, ViT, SuperPoint, and RoRD; selected the “RoRD + self-supervision + geometric constraints” direction.
Phase 2: Model adaptation and loss design	July 2025	Tailor RoRD to IC layout geometry and design geometry-aware losses.	Removed orthographic view generation, introduced sliding-window and pyramid matching, and designed compound geometry losses.
Phase 3: Architecture modernization and speedup	September 2025	Improve maintainability, experiment speed, and inference speed.	Added FPN and NMS, moved to YAML configuration, modularized the codebase, and integrated TensorBoard tracking.
Current state: technical maturity and extensibility	October 2025	Core components are stable enough for larger experiments and fast application trials.	FPN, NMS, configuration management, modular code, and experiment tracking have all landed.

The project has moved from route validation into a phase focused on continuous optimization and academic output.

Technical Implementation And Innovations

Model Architecture: From VGG To FPN

The model uses a modern backbone with parallel keypoint detection and descriptor heads. By introducing a Feature Pyramid Network (FPN), it obtains multi-scale features, P2 / P3 / P4, in a single forward pass and avoids repeated image-pyramid inference.

Key design points:

Lateral connections: map C2 / C3 / C4 backbone features into a unified channel dimension.
Top-down fusion: use upsampling and smoothing to build scale-aligned pyramid features.
Shared heads: reuse detection and descriptor heads across pyramid levels for efficiency and consistency.

def forward(self, x: torch.Tensor, return_pyramid: bool = False):
  if not return_pyramid:
    features = self.backbone(x)
    detection_map = self.detection_head(features)
    descriptors = self.descriptor_head(features)
    return detection_map, descriptors

  c2, c3, c4 = self._extract_c234(x)
  p4 = self.lateral_c4(c4)
  p3 = self.lateral_c3(c3) + F.interpolate(p4, size=c3.shape[-2:], mode="nearest")
  p2 = self.lateral_c2(c2) + F.interpolate(p3, size=c2.shape[-2:], mode="nearest")

  p4 = self.smooth_p4(p4)
  p3 = self.smooth_p3(p3)
  p2 = self.smooth_p2(p2)

  pyramid = {}
  if 4 in self.fpn_levels:
    pyramid["P4"] = (self.det_head_fpn(p4), self.desc_head_fpn(p4), 8)
  if 3 in self.fpn_levels:
    pyramid["P3"] = (self.det_head_fpn(p3), self.desc_head_fpn(p3), 4)
  if 2 in self.fpn_levels:
    pyramid["P2"] = (self.det_head_fpn(p2), self.desc_head_fpn(p2), 2)
  return pyramid

Core Innovation: Geometry-Aware Losses

To force the model to learn layout geometry rather than natural-image texture, the project combines detection and descriptor losses:

L_{\text{total}} = L_{\text{det}} + L_{\text{desc}}

L_{\text{det}} = \operatorname{BCE}(D_{\text{o}}, \mathcal{W}(D_{\text{r}}, H^{-1})) + 0.1\,\operatorname{SmoothL1}(D_{\text{o}}, \mathcal{W}(D_{\text{r}}, H^{-1}))

L_{\text{desc}} = L_{\text{triplet}} + 0.1L_{\text{manhattan}} + 0.01L_{\text{sparse}} + 0.05L_{\text{binary}}

Loss term	Role
$L_{\text{triplet}}$	Uses an L1 geometric Triplet Loss to emphasize rotation consistency.
$L_{\text{manhattan}}$	Enforces descriptor consistency under 90-degree rotations and helps separate repeated structures.
$L_{\text{sparse}}$	Suppresses noise from blank layout regions.
$L_{\text{binary}}$	Strengthens geometric boundary representation through sign-level consistency.

def compute_description_loss(desc_original, desc_rotated, H, margin=1.0):
  # Rotation consistency and hard-negative mining.
  negative_list = []
  for angle in [90, 180, 270]:
    rotated_coords = rotate_coords(manhattan_coords, angle)
    negative_list.append(rotated_coords)

  geometric_triplet = triplet_loss(anchor, positive, hardest_negative)
  manhattan_loss = compute_manhattan_alignment(anchor, positive)
  sparsity_loss = torch.mean(torch.abs(anchor)) + torch.mean(torch.abs(positive))
  binary_loss = torch.mean(torch.abs(torch.sign(anchor) - torch.sign(positive)))

  return geometric_triplet + 0.1 * manhattan_loss + 0.01 * sparsity_loss + 0.05 * binary_loss

Training Strategy: Self-Supervision And Stability

Training pairs are generated automatically as (original, rotated, H) through random geometric transformations, avoiding manual annotation. Augmentations include scale jittering, Sobel edge enhancement, brightness and contrast changes, and Gaussian noise. The training loop also uses gradient clipping, early stopping, and ReduceLROnPlateau scheduling.

for epoch in range(epochs):
  model.train()
  for original, rotated, H in train_dataloader:
    original, rotated, H = original.cuda(), rotated.cuda(), H.cuda()
    det_o, desc_o = model(original)
    det_r, desc_r = model(rotated)

    det_loss = compute_detection_loss(det_o, det_r, H)
    desc_loss = compute_description_loss(desc_o, desc_r, H)
    loss = det_loss + desc_loss

    optimizer.zero_grad()
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
    optimizer.step()

  scheduler.step(validation_metric)
  if early_stopper.should_stop():
    break

Inference And Matching: Efficient Multi-Instance Detection

Inference combines FPN, multi-scale keypoint extraction, radius NMS, and RANSAC for fast and stable multi-instance matching.

extract_from_pyramid: extracts all-scale keypoints and descriptors in one pass.
radius_nms: removes duplicate points by score and spatial distance.
MNN + RANSAC: first matches by mutual nearest neighbor, then estimates a homography and removes outliers.
Multi-instance loop: iteratively matches each template instance and masks detected regions.

def extract_from_pyramid(model, image_tensor, kp_thresh, nms_cfg):
  with torch.no_grad():
    pyramid = model(image_tensor, return_pyramid=True)
  keypoints, descriptors = [], []
  for level_name, (det, desc, stride) in pyramid.items():
    kps, descs = decode_level(det, desc, stride, kp_thresh)
    if nms_cfg.get("enabled", False):
      keep = radius_nms(kps, det["scores"], nms_cfg["radius"])
      kps, descs = kps[keep], descs[keep]
    keypoints.append(kps)
    descriptors.append(descs)
  return torch.cat(keypoints, dim=0), torch.cat(descriptors, dim=0)

Engineering: Configuration And Experiment Tracking

The implementation focuses on reproducibility, tunability, and traceability:

YAML configuration center: OmegaConf manages configs/base_config.yaml, decoupling parameters from code.
Modular codebase: datasets, losses, models, and utilities are maintained independently.
TensorBoard integration: training, evaluation, and matching scripts all log key metrics for visual comparison.

model:
  backbone: resnet34
  fpn:
    enabled: true
    out_channels: 256
    levels: [2, 3, 4]

matching:
  keypoint_threshold: 0.5
  nms:
    enabled: true
    radius: 4
  min_inliers: 15

logging:
  use_tensorboard: true
  log_dir: runs
  experiment_name: baseline

Outcomes And Expected Effects

Current Capabilities

Multi-instance detection: locates multiple instances of the same template in large layouts.
Rotation robustness: remains stable under 0-to-360-degree rotations and mirror transforms.
Efficient inference: FPN compresses image-pyramid inference into one forward pass.
Visual evaluation: Precision, Recall, F1, and tuning records are tracked through TensorBoard.

Expected Quantitative Targets

Metric	Target
Accuracy	F1 ≥ 95% on trained templates or similar-style validation sets.
Speed	Single-template matching for million-gate layouts within 1 minute on V100 / A100 class GPUs.
Robustness	Stable recognition under mild linewidth changes and metal-fill differences.
Extensibility	F1 ≥ 85% for direct matching of new templates without retraining.

Future Work

Technical Roadmap

Direction	Priority	Plan
Data strategy	High	Add elastic deformation and defect simulation; build a procedural layout generator with `gdstk` for large synthetic datasets.
Training strategy	High	Introduce uncertainty-based automatic loss weighting; strengthen hard-sample mining for descriptor learning.
Model architecture	Medium	Try ResNet / EfficientNet backbones as alternatives to VGG; explore attention modules for key geometric structures.

Publication Plan

Core paper contributions:

Introduce rotation-robust local features, RoRD, into IC layout recognition and validate the idea in industrial-style settings.
Propose a compound layout-geometry loss system for sparse binary and repetitive structures.
Combine FPN and NMS to make matching speed and multi-instance detection practical.

Target venue	Estimated deadline	Estimated notification	Estimated event	Strategy
ICCAD 2026	Mid-to-late May 2026	Early August 2026	Late October 2026	Primary target, with a focus on rotation-robust recognition innovation.
DATE 2027	Mid-September 2026	Mid-December 2026	March-April 2027	Plan A if ICCAD is not accepted; iterate quickly using review feedback.
DAC 2027	Mid-to-late November 2026	Late February 2027	June-July 2027	Plan B with three months for stronger experiments and comparisons.
ASP-DAC 2028	Mid-July 2027	Mid-October 2027	Late January 2028	Backup path after further model polishing and industrial validation.
IEEE TCAD	Rolling	-	-	Final option: consolidate the work into a journal manuscript.

Appendix

Project Structure

RoRD-Layout-Recognation/
├── configs/
│   └── base_config.yaml      # YAML configuration center
├── data/
│   └── ic_dataset.py         # Dataset definition
├── docs/                     # Documentation
├── models/
│   └── rord.py               # RoRD model with FPN
├── utils/                    # Shared utilities
├── losses.py                 # Geometry-aware losses
├── train.py                  # Training script
├── evaluate.py               # Evaluation script
├── match.py                  # Template matching script
├── pyproject.toml            # Dependencies
└── README.md

Quick Start

The recommended workflow uses uv:

# Install dependencies
uv sync

# Training
uv run python train.py --config configs/your_exp_config.yaml

# Layout template matching
uv run python match.py \
  --config configs/your_exp_config.yaml \
  --model_path path/to/model.pth \
  --layout path/to/layout.png \
  --template path/to/template.png

# Start TensorBoard
uv run tensorboard --logdir runs

Dataset Requirements

Training only needs many unlabeled PNG layout images, because the self-supervised pipeline generates training pairs. Validation data should include template images, layout images, and JSON annotations describing template locations.

{
  "boxes": [
    {"template": "template1.png", "x": 100, "y": 200, "width": 50, "height": 50},
    {"template": "template2.png", "x": 300, "y": 400, "width": 60, "height": 60}
  ]
}

Resource Guidance

Resource type	Requirement	Notes
Dataset, startup	100-200 images	High-resolution layouts for functional validation.
Dataset, initial usable	1,000-2,000 images	Enough to learn stable geometric descriptors.
Dataset, production-grade	5,000-10,000+ images	Covers multiple processes and design styles.
Entry GPU	RTX 3060 / 4060	Small experiments and functional validation.
Mainstream GPU	RTX 3080 / 4070	Recommended balance of speed and cost.
Professional GPU	RTX 3090 / 4090 / A6000	Large experiments or production deployment.
VRAM	≥ 12 GB	Batch Size = 8, Patch = 256×256.
CPU / memory	8 cores / 32 GB	Keeps preprocessing from becoming the bottleneck.

Training Time Estimate

Stage	Estimate	Notes
Single epoch	15-25 minutes	RTX 3080, 2,000 images.
Total training time	About 16.7 hours	50 epochs at 20 minutes per epoch.
Practical convergence	10-20 hours	With early stopping, patience = 10.
Augmentation tuning	1-2 weeks	Scale, brightness, and noise parameters.
Loss weight tuning	1-2 weeks	Balance BCE / Triplet / Manhattan and related terms.
Hyperparameter search	2-4 weeks	Learning rate, batch size, optimizer.
Architecture tuning	2-4 weeks	Different backbones and attention modules.
Total tuning cycle	1.5-3 months	Toward production-grade model quality.

RoRD-Layout-Recognation Comprehensive Technical Report

https://www.jiao77.com/en/blog/report/rord-comprehensive-technical-report-2025-10-05/

Author

Jiao77

Published on

Oct 5, 2025

License

CC BY-NC-SA 4.0

Loading comments...