RoRD-Layout-Recognation Comprehensive Technical Report

Oct 5, 2025

1437 words

7 min read

RoRD

RoRD-Layout-Recognation Comprehensive Technical Report

This report summarizes the full development path of RoRD-Layout-Recognation, from topic selection and model adaptation to engineering implementation and publication planning. The project brings rotation-robust local features into IC layout recognition, with the goal of building a high-accuracy, low-latency, zero/few-shot-friendly template matching system.

Report Info

Version: 1.3, complete edition
Date: October 5, 2025
Type: Comprehensive technical report
Keywords: RoRD, IC layout recognition, FPN, geometry-aware loss, self-supervised training, DTCO.

Project Overview

RoRD-Layout-Recognation adapts RoRD, or Rotation-Robust Descriptors, into an IC layout template recognition system. The target is to support design-technology co-optimization (DTCO) with automated, robust, and reusable layout analysis capabilities.

Project Goals

  1. High-accuracy template matching: locate all template instances in complex layouts, including eight orientations and scale variations.
  2. Efficient inference: enable near-real-time matching for large GDSII-derived layout images.
  3. Zero/few-shot generalization: support new templates without relying on large annotated datasets.
  4. Standardized research workflow: build a reproducible pipeline for data preparation, training, tuning, and evaluation.

Main Challenges

  1. Data scarcity: high-quality labeled layout data is expensive and difficult to obtain.
  2. Geometric variation: templates may appear under rotations and mirrors, requiring robust descriptors.
  3. Structural complexity: IC layouts contain Manhattan grids, binary sparse signals, and repeated structures that differ sharply from natural images.
  4. Fast evolution: process nodes and IP libraries evolve continuously, so the model must adapt quickly.

Project Journey And Current State

PhasePeriodFocusKey Outcomes
Phase 1: Proposal and technology selectionJune 2025Define the DTCO-oriented target and evaluate the RoRD route.Compared U-Net, YOLO, ViT, SuperPoint, and RoRD; selected the “RoRD + self-supervision + geometric constraints” direction.
Phase 2: Model adaptation and loss designJuly 2025Tailor RoRD to IC layout geometry and design geometry-aware losses.Removed orthographic view generation, introduced sliding-window and pyramid matching, and designed compound geometry losses.
Phase 3: Architecture modernization and speedupSeptember 2025Improve maintainability, experiment speed, and inference speed.Added FPN and NMS, moved to YAML configuration, modularized the codebase, and integrated TensorBoard tracking.
Current state: technical maturity and extensibilityOctober 2025Core components are stable enough for larger experiments and fast application trials.FPN, NMS, configuration management, modular code, and experiment tracking have all landed.

The project has moved from route validation into a phase focused on continuous optimization and academic output.

Technical Implementation And Innovations

Model Architecture: From VGG To FPN

The model uses a modern backbone with parallel keypoint detection and descriptor heads. By introducing a Feature Pyramid Network (FPN), it obtains multi-scale features, P2 / P3 / P4, in a single forward pass and avoids repeated image-pyramid inference.

Key design points:

  1. Lateral connections: map C2 / C3 / C4 backbone features into a unified channel dimension.
  2. Top-down fusion: use upsampling and smoothing to build scale-aligned pyramid features.
  3. Shared heads: reuse detection and descriptor heads across pyramid levels for efficiency and consistency.
def forward(self, x: torch.Tensor, return_pyramid: bool = False):
  if not return_pyramid:
    features = self.backbone(x)
    detection_map = self.detection_head(features)
    descriptors = self.descriptor_head(features)
    return detection_map, descriptors

  c2, c3, c4 = self._extract_c234(x)
  p4 = self.lateral_c4(c4)
  p3 = self.lateral_c3(c3) + F.interpolate(p4, size=c3.shape[-2:], mode="nearest")
  p2 = self.lateral_c2(c2) + F.interpolate(p3, size=c2.shape[-2:], mode="nearest")

  p4 = self.smooth_p4(p4)
  p3 = self.smooth_p3(p3)
  p2 = self.smooth_p2(p2)

  pyramid = {}
  if 4 in self.fpn_levels:
    pyramid["P4"] = (self.det_head_fpn(p4), self.desc_head_fpn(p4), 8)
  if 3 in self.fpn_levels:
    pyramid["P3"] = (self.det_head_fpn(p3), self.desc_head_fpn(p3), 4)
  if 2 in self.fpn_levels:
    pyramid["P2"] = (self.det_head_fpn(p2), self.desc_head_fpn(p2), 2)
  return pyramid

Core Innovation: Geometry-Aware Losses

To force the model to learn layout geometry rather than natural-image texture, the project combines detection and descriptor losses:

Ltotal=Ldet+LdescL_{\text{total}} = L_{\text{det}} + L_{\text{desc}} Ldet=BCE(Do,W(Dr,H1))+0.1SmoothL1(Do,W(Dr,H1))L_{\text{det}} = \operatorname{BCE}(D_{\text{o}}, \mathcal{W}(D_{\text{r}}, H^{-1})) + 0.1\,\operatorname{SmoothL1}(D_{\text{o}}, \mathcal{W}(D_{\text{r}}, H^{-1})) Ldesc=Ltriplet+0.1Lmanhattan+0.01Lsparse+0.05LbinaryL_{\text{desc}} = L_{\text{triplet}} + 0.1L_{\text{manhattan}} + 0.01L_{\text{sparse}} + 0.05L_{\text{binary}}
Loss termRole
LtripletL_{\text{triplet}}Uses an L1 geometric Triplet Loss to emphasize rotation consistency.
LmanhattanL_{\text{manhattan}}Enforces descriptor consistency under 90-degree rotations and helps separate repeated structures.
LsparseL_{\text{sparse}}Suppresses noise from blank layout regions.
LbinaryL_{\text{binary}}Strengthens geometric boundary representation through sign-level consistency.
def compute_description_loss(desc_original, desc_rotated, H, margin=1.0):
  # Rotation consistency and hard-negative mining.
  negative_list = []
  for angle in [90, 180, 270]:
    rotated_coords = rotate_coords(manhattan_coords, angle)
    negative_list.append(rotated_coords)

  geometric_triplet = triplet_loss(anchor, positive, hardest_negative)
  manhattan_loss = compute_manhattan_alignment(anchor, positive)
  sparsity_loss = torch.mean(torch.abs(anchor)) + torch.mean(torch.abs(positive))
  binary_loss = torch.mean(torch.abs(torch.sign(anchor) - torch.sign(positive)))

  return geometric_triplet + 0.1 * manhattan_loss + 0.01 * sparsity_loss + 0.05 * binary_loss

Training Strategy: Self-Supervision And Stability

Training pairs are generated automatically as (original, rotated, H) through random geometric transformations, avoiding manual annotation. Augmentations include scale jittering, Sobel edge enhancement, brightness and contrast changes, and Gaussian noise. The training loop also uses gradient clipping, early stopping, and ReduceLROnPlateau scheduling.

for epoch in range(epochs):
  model.train()
  for original, rotated, H in train_dataloader:
    original, rotated, H = original.cuda(), rotated.cuda(), H.cuda()
    det_o, desc_o = model(original)
    det_r, desc_r = model(rotated)

    det_loss = compute_detection_loss(det_o, det_r, H)
    desc_loss = compute_description_loss(desc_o, desc_r, H)
    loss = det_loss + desc_loss

    optimizer.zero_grad()
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
    optimizer.step()

  scheduler.step(validation_metric)
  if early_stopper.should_stop():
    break

Inference And Matching: Efficient Multi-Instance Detection

Inference combines FPN, multi-scale keypoint extraction, radius NMS, and RANSAC for fast and stable multi-instance matching.

  1. extract_from_pyramid: extracts all-scale keypoints and descriptors in one pass.
  2. radius_nms: removes duplicate points by score and spatial distance.
  3. MNN + RANSAC: first matches by mutual nearest neighbor, then estimates a homography and removes outliers.
  4. Multi-instance loop: iteratively matches each template instance and masks detected regions.
def extract_from_pyramid(model, image_tensor, kp_thresh, nms_cfg):
  with torch.no_grad():
    pyramid = model(image_tensor, return_pyramid=True)
  keypoints, descriptors = [], []
  for level_name, (det, desc, stride) in pyramid.items():
    kps, descs = decode_level(det, desc, stride, kp_thresh)
    if nms_cfg.get("enabled", False):
      keep = radius_nms(kps, det["scores"], nms_cfg["radius"])
      kps, descs = kps[keep], descs[keep]
    keypoints.append(kps)
    descriptors.append(descs)
  return torch.cat(keypoints, dim=0), torch.cat(descriptors, dim=0)

Engineering: Configuration And Experiment Tracking

The implementation focuses on reproducibility, tunability, and traceability:

  1. YAML configuration center: OmegaConf manages configs/base_config.yaml, decoupling parameters from code.
  2. Modular codebase: datasets, losses, models, and utilities are maintained independently.
  3. TensorBoard integration: training, evaluation, and matching scripts all log key metrics for visual comparison.
model:
  backbone: resnet34
  fpn:
    enabled: true
    out_channels: 256
    levels: [2, 3, 4]

matching:
  keypoint_threshold: 0.5
  nms:
    enabled: true
    radius: 4
  min_inliers: 15

logging:
  use_tensorboard: true
  log_dir: runs
  experiment_name: baseline

Outcomes And Expected Effects

Current Capabilities

  1. Multi-instance detection: locates multiple instances of the same template in large layouts.
  2. Rotation robustness: remains stable under 0-to-360-degree rotations and mirror transforms.
  3. Efficient inference: FPN compresses image-pyramid inference into one forward pass.
  4. Visual evaluation: Precision, Recall, F1, and tuning records are tracked through TensorBoard.

Expected Quantitative Targets

MetricTarget
AccuracyF1 ≥ 95% on trained templates or similar-style validation sets.
SpeedSingle-template matching for million-gate layouts within 1 minute on V100 / A100 class GPUs.
RobustnessStable recognition under mild linewidth changes and metal-fill differences.
ExtensibilityF1 ≥ 85% for direct matching of new templates without retraining.

Future Work

Technical Roadmap

DirectionPriorityPlan
Data strategyHighAdd elastic deformation and defect simulation; build a procedural layout generator with gdstk for large synthetic datasets.
Training strategyHighIntroduce uncertainty-based automatic loss weighting; strengthen hard-sample mining for descriptor learning.
Model architectureMediumTry ResNet / EfficientNet backbones as alternatives to VGG; explore attention modules for key geometric structures.

Publication Plan

Core paper contributions:

  1. Introduce rotation-robust local features, RoRD, into IC layout recognition and validate the idea in industrial-style settings.
  2. Propose a compound layout-geometry loss system for sparse binary and repetitive structures.
  3. Combine FPN and NMS to make matching speed and multi-instance detection practical.
Target venueEstimated deadlineEstimated notificationEstimated eventStrategy
ICCAD 2026Mid-to-late May 2026Early August 2026Late October 2026Primary target, with a focus on rotation-robust recognition innovation.
DATE 2027Mid-September 2026Mid-December 2026March-April 2027Plan A if ICCAD is not accepted; iterate quickly using review feedback.
DAC 2027Mid-to-late November 2026Late February 2027June-July 2027Plan B with three months for stronger experiments and comparisons.
ASP-DAC 2028Mid-July 2027Mid-October 2027Late January 2028Backup path after further model polishing and industrial validation.
IEEE TCADRolling--Final option: consolidate the work into a journal manuscript.

Appendix

Project Structure

RoRD-Layout-Recognation/
├── configs/
│   └── base_config.yaml      # YAML configuration center
├── data/
│   └── ic_dataset.py         # Dataset definition
├── docs/                     # Documentation
├── models/
│   └── rord.py               # RoRD model with FPN
├── utils/                    # Shared utilities
├── losses.py                 # Geometry-aware losses
├── train.py                  # Training script
├── evaluate.py               # Evaluation script
├── match.py                  # Template matching script
├── pyproject.toml            # Dependencies
└── README.md

Quick Start

The recommended workflow uses uv:

# Install dependencies
uv sync

# Training
uv run python train.py --config configs/your_exp_config.yaml

# Layout template matching
uv run python match.py \
  --config configs/your_exp_config.yaml \
  --model_path path/to/model.pth \
  --layout path/to/layout.png \
  --template path/to/template.png

# Start TensorBoard
uv run tensorboard --logdir runs

Dataset Requirements

Training only needs many unlabeled PNG layout images, because the self-supervised pipeline generates training pairs. Validation data should include template images, layout images, and JSON annotations describing template locations.

{
  "boxes": [
    {"template": "template1.png", "x": 100, "y": 200, "width": 50, "height": 50},
    {"template": "template2.png", "x": 300, "y": 400, "width": 60, "height": 60}
  ]
}

Resource Guidance

Resource typeRequirementNotes
Dataset, startup100-200 imagesHigh-resolution layouts for functional validation.
Dataset, initial usable1,000-2,000 imagesEnough to learn stable geometric descriptors.
Dataset, production-grade5,000-10,000+ imagesCovers multiple processes and design styles.
Entry GPURTX 3060 / 4060Small experiments and functional validation.
Mainstream GPURTX 3080 / 4070Recommended balance of speed and cost.
Professional GPURTX 3090 / 4090 / A6000Large experiments or production deployment.
VRAM≥ 12 GBBatch Size = 8, Patch = 256×256.
CPU / memory8 cores / 32 GBKeeps preprocessing from becoming the bottleneck.

Training Time Estimate

StageEstimateNotes
Single epoch15-25 minutesRTX 3080, 2,000 images.
Total training timeAbout 16.7 hours50 epochs at 20 minutes per epoch.
Practical convergence10-20 hoursWith early stopping, patience = 10.
Augmentation tuning1-2 weeksScale, brightness, and noise parameters.
Loss weight tuning1-2 weeksBalance BCE / Triplet / Manhattan and related terms.
Hyperparameter search2-4 weeksLearning rate, batch size, optimizer.
Architecture tuning2-4 weeksDifferent backbones and attention modules.
Total tuning cycle1.5-3 monthsToward production-grade model quality.
RoRD-Layout-Recognation Comprehensive Technical Report
https://www.jiao77.com/en/blog/report/rord-comprehensive-technical-report-2025-10-05/
Author
Jiao77
Published on
Oct 5, 2025
License
CC BY-NC-SA 4.0

Loading comments...

Enter keywords to start searching