This report summarizes the full development path of RoRD-Layout-Recognation, from topic selection and model adaptation to engineering implementation and publication planning. The project brings rotation-robust local features into IC layout recognition, with the goal of building a high-accuracy, low-latency, zero/few-shot-friendly template matching system.
Version: 1.3, complete edition
Date: October 5, 2025
Type: Comprehensive technical report
Keywords: RoRD, IC layout recognition, FPN, geometry-aware loss, self-supervised training, DTCO.
RoRD-Layout-Recognation adapts RoRD, or Rotation-Robust Descriptors, into an IC layout template recognition system. The target is to support design-technology co-optimization (DTCO) with automated, robust, and reusable layout analysis capabilities.
| Phase | Period | Focus | Key Outcomes |
|---|---|---|---|
| Phase 1: Proposal and technology selection | June 2025 | Define the DTCO-oriented target and evaluate the RoRD route. | Compared U-Net, YOLO, ViT, SuperPoint, and RoRD; selected the “RoRD + self-supervision + geometric constraints” direction. |
| Phase 2: Model adaptation and loss design | July 2025 | Tailor RoRD to IC layout geometry and design geometry-aware losses. | Removed orthographic view generation, introduced sliding-window and pyramid matching, and designed compound geometry losses. |
| Phase 3: Architecture modernization and speedup | September 2025 | Improve maintainability, experiment speed, and inference speed. | Added FPN and NMS, moved to YAML configuration, modularized the codebase, and integrated TensorBoard tracking. |
| Current state: technical maturity and extensibility | October 2025 | Core components are stable enough for larger experiments and fast application trials. | FPN, NMS, configuration management, modular code, and experiment tracking have all landed. |
The project has moved from route validation into a phase focused on continuous optimization and academic output.
The model uses a modern backbone with parallel keypoint detection and descriptor heads. By introducing a Feature Pyramid Network (FPN), it obtains multi-scale features, P2 / P3 / P4, in a single forward pass and avoids repeated image-pyramid inference.
Key design points:
def forward(self, x: torch.Tensor, return_pyramid: bool = False):
if not return_pyramid:
features = self.backbone(x)
detection_map = self.detection_head(features)
descriptors = self.descriptor_head(features)
return detection_map, descriptors
c2, c3, c4 = self._extract_c234(x)
p4 = self.lateral_c4(c4)
p3 = self.lateral_c3(c3) + F.interpolate(p4, size=c3.shape[-2:], mode="nearest")
p2 = self.lateral_c2(c2) + F.interpolate(p3, size=c2.shape[-2:], mode="nearest")
p4 = self.smooth_p4(p4)
p3 = self.smooth_p3(p3)
p2 = self.smooth_p2(p2)
pyramid = {}
if 4 in self.fpn_levels:
pyramid["P4"] = (self.det_head_fpn(p4), self.desc_head_fpn(p4), 8)
if 3 in self.fpn_levels:
pyramid["P3"] = (self.det_head_fpn(p3), self.desc_head_fpn(p3), 4)
if 2 in self.fpn_levels:
pyramid["P2"] = (self.det_head_fpn(p2), self.desc_head_fpn(p2), 2)
return pyramid
To force the model to learn layout geometry rather than natural-image texture, the project combines detection and descriptor losses:
| Loss term | Role |
|---|---|
| Uses an L1 geometric Triplet Loss to emphasize rotation consistency. | |
| Enforces descriptor consistency under 90-degree rotations and helps separate repeated structures. | |
| Suppresses noise from blank layout regions. | |
| Strengthens geometric boundary representation through sign-level consistency. |
def compute_description_loss(desc_original, desc_rotated, H, margin=1.0):
# Rotation consistency and hard-negative mining.
negative_list = []
for angle in [90, 180, 270]:
rotated_coords = rotate_coords(manhattan_coords, angle)
negative_list.append(rotated_coords)
geometric_triplet = triplet_loss(anchor, positive, hardest_negative)
manhattan_loss = compute_manhattan_alignment(anchor, positive)
sparsity_loss = torch.mean(torch.abs(anchor)) + torch.mean(torch.abs(positive))
binary_loss = torch.mean(torch.abs(torch.sign(anchor) - torch.sign(positive)))
return geometric_triplet + 0.1 * manhattan_loss + 0.01 * sparsity_loss + 0.05 * binary_loss
Training pairs are generated automatically as (original, rotated, H) through random geometric transformations, avoiding manual annotation. Augmentations include scale jittering, Sobel edge enhancement, brightness and contrast changes, and Gaussian noise. The training loop also uses gradient clipping, early stopping, and ReduceLROnPlateau scheduling.
for epoch in range(epochs):
model.train()
for original, rotated, H in train_dataloader:
original, rotated, H = original.cuda(), rotated.cuda(), H.cuda()
det_o, desc_o = model(original)
det_r, desc_r = model(rotated)
det_loss = compute_detection_loss(det_o, det_r, H)
desc_loss = compute_description_loss(desc_o, desc_r, H)
loss = det_loss + desc_loss
optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
scheduler.step(validation_metric)
if early_stopper.should_stop():
break
Inference combines FPN, multi-scale keypoint extraction, radius NMS, and RANSAC for fast and stable multi-instance matching.
extract_from_pyramid: extracts all-scale keypoints and descriptors in one pass.radius_nms: removes duplicate points by score and spatial distance.def extract_from_pyramid(model, image_tensor, kp_thresh, nms_cfg):
with torch.no_grad():
pyramid = model(image_tensor, return_pyramid=True)
keypoints, descriptors = [], []
for level_name, (det, desc, stride) in pyramid.items():
kps, descs = decode_level(det, desc, stride, kp_thresh)
if nms_cfg.get("enabled", False):
keep = radius_nms(kps, det["scores"], nms_cfg["radius"])
kps, descs = kps[keep], descs[keep]
keypoints.append(kps)
descriptors.append(descs)
return torch.cat(keypoints, dim=0), torch.cat(descriptors, dim=0)
The implementation focuses on reproducibility, tunability, and traceability:
OmegaConf manages configs/base_config.yaml, decoupling parameters from code.model:
backbone: resnet34
fpn:
enabled: true
out_channels: 256
levels: [2, 3, 4]
matching:
keypoint_threshold: 0.5
nms:
enabled: true
radius: 4
min_inliers: 15
logging:
use_tensorboard: true
log_dir: runs
experiment_name: baseline
| Metric | Target |
|---|---|
| Accuracy | F1 ≥ 95% on trained templates or similar-style validation sets. |
| Speed | Single-template matching for million-gate layouts within 1 minute on V100 / A100 class GPUs. |
| Robustness | Stable recognition under mild linewidth changes and metal-fill differences. |
| Extensibility | F1 ≥ 85% for direct matching of new templates without retraining. |
| Direction | Priority | Plan |
|---|---|---|
| Data strategy | High | Add elastic deformation and defect simulation; build a procedural layout generator with gdstk for large synthetic datasets. |
| Training strategy | High | Introduce uncertainty-based automatic loss weighting; strengthen hard-sample mining for descriptor learning. |
| Model architecture | Medium | Try ResNet / EfficientNet backbones as alternatives to VGG; explore attention modules for key geometric structures. |
Core paper contributions:
| Target venue | Estimated deadline | Estimated notification | Estimated event | Strategy |
|---|---|---|---|---|
| ICCAD 2026 | Mid-to-late May 2026 | Early August 2026 | Late October 2026 | Primary target, with a focus on rotation-robust recognition innovation. |
| DATE 2027 | Mid-September 2026 | Mid-December 2026 | March-April 2027 | Plan A if ICCAD is not accepted; iterate quickly using review feedback. |
| DAC 2027 | Mid-to-late November 2026 | Late February 2027 | June-July 2027 | Plan B with three months for stronger experiments and comparisons. |
| ASP-DAC 2028 | Mid-July 2027 | Mid-October 2027 | Late January 2028 | Backup path after further model polishing and industrial validation. |
| IEEE TCAD | Rolling | - | - | Final option: consolidate the work into a journal manuscript. |
RoRD-Layout-Recognation/
├── configs/
│ └── base_config.yaml # YAML configuration center
├── data/
│ └── ic_dataset.py # Dataset definition
├── docs/ # Documentation
├── models/
│ └── rord.py # RoRD model with FPN
├── utils/ # Shared utilities
├── losses.py # Geometry-aware losses
├── train.py # Training script
├── evaluate.py # Evaluation script
├── match.py # Template matching script
├── pyproject.toml # Dependencies
└── README.md
The recommended workflow uses uv:
# Install dependencies
uv sync
# Training
uv run python train.py --config configs/your_exp_config.yaml
# Layout template matching
uv run python match.py \
--config configs/your_exp_config.yaml \
--model_path path/to/model.pth \
--layout path/to/layout.png \
--template path/to/template.png
# Start TensorBoard
uv run tensorboard --logdir runs
Training only needs many unlabeled PNG layout images, because the self-supervised pipeline generates training pairs. Validation data should include template images, layout images, and JSON annotations describing template locations.
{
"boxes": [
{"template": "template1.png", "x": 100, "y": 200, "width": 50, "height": 50},
{"template": "template2.png", "x": 300, "y": 400, "width": 60, "height": 60}
]
}
| Resource type | Requirement | Notes |
|---|---|---|
| Dataset, startup | 100-200 images | High-resolution layouts for functional validation. |
| Dataset, initial usable | 1,000-2,000 images | Enough to learn stable geometric descriptors. |
| Dataset, production-grade | 5,000-10,000+ images | Covers multiple processes and design styles. |
| Entry GPU | RTX 3060 / 4060 | Small experiments and functional validation. |
| Mainstream GPU | RTX 3080 / 4070 | Recommended balance of speed and cost. |
| Professional GPU | RTX 3090 / 4090 / A6000 | Large experiments or production deployment. |
| VRAM | ≥ 12 GB | Batch Size = 8, Patch = 256×256. |
| CPU / memory | 8 cores / 32 GB | Keeps preprocessing from becoming the bottleneck. |
| Stage | Estimate | Notes |
|---|---|---|
| Single epoch | 15-25 minutes | RTX 3080, 2,000 images. |
| Total training time | About 16.7 hours | 50 epochs at 20 minutes per epoch. |
| Practical convergence | 10-20 hours | With early stopping, patience = 10. |
| Augmentation tuning | 1-2 weeks | Scale, brightness, and noise parameters. |
| Loss weight tuning | 1-2 weeks | Balance BCE / Triplet / Manhattan and related terms. |
| Hyperparameter search | 2-4 weeks | Learning rate, batch size, optimizer. |
| Architecture tuning | 2-4 weeks | Different backbones and attention modules. |
| Total tuning cycle | 1.5-3 months | Toward production-grade model quality. |
Loading comments...