RoRD Midterm Review: Rotation-Robust Descriptors For IC Layout Recognition

Nov 1, 2025

1236 words

6 min read

RoRD: Rotation-Robust Descriptors For IC Layout Recognition

This is the midterm review report for a Zhejiang University Chu Kochen Honors College deep research training project. It summarizes the progress of RoRD-Layout-Recognition, including core implementation, performance testing, innovation analysis, and future work.

Report Information

Date: November 2025
Type: midterm review report
Presenter: Tiansheng Jiao

Project Overview

Background And Goals

IC layout recognition is a key technology in semiconductor manufacturing and EDA. As chip designs become increasingly complex, traditional pixel-matching methods struggle with geometric transformations such as rotation and scaling. Traditional geometry-based algorithms also face high computational complexity and long runtime.

Core goals:

Develop rotation-robust IC layout descriptors, or RoRD, to address sensitivity to geometric transformations.
Achieve high-accuracy layout geometric feature matching for multi-scale and multi-instance retrieval.
Build an end-to-end layout recognition solution for semiconductor design and manufacturing.

Key Problems

Problem	Description
Geometric invariance	IC layouts frequently use rotations such as 0, 90, 180, and 270 degrees, while traditional methods struggle to keep features consistent.
Dynamic IP library expansion	IP and standard-cell libraries are large and frequently updated, requiring adaptation to new templates without repeated retraining.
Data scarcity	Supervised learning needs large amounts of fine-grained labeled data, but pixel-level and bounding-box labels are expensive in layout domains.
Real-time requirements	Industrial use requires fast processing, batch execution, and concurrent inference.

Limitations Of Traditional Methods

Method	Strength	Limitation
Direct pixel matching	Simple and intuitive	Sensitive to rotation and weak in robustness.
SIFT / SURF features	Scale invariance	Poor fit for IC layout geometry.
Deep learning classification	End-to-end learning	Requires large labeled datasets.
Traditional hash matching	Fast	Limited accuracy and weak geometric-transform support.

Technical Advantages

Manhattan geometry constraints for right-angle and grid-like layout structures.
Diffusion-based data augmentation to synthesize data from real layouts.
Multi-scale feature fusion for different process nodes and hierarchy levels.
End-to-end automated pipeline from raw data to trained model.

Completion Analysis

As of November 2025, the core framework and basic functionality are complete. Overall completion is about 65%. The core model implementation has reached 90%, while more effort is still needed in training validation and performance testing.

Module	Completion	Quality	Key missing item
Core model implementation	90%	Excellent	Training validation
Data processing pipeline	85%	Good	Large-scale testing
Matching algorithm optimization	80%	Good	Real-data validation
Training infrastructure	70%	Medium	Distributed support
Documentation and examples	60%	Medium	Industrial cases
Performance testing	50%	Low	Post-training evaluation

Key Remaining Tasks

Task	Remaining work	Plan
Model training and optimization	Real training, parameter tuning, grid search, and convergence validation.	Main focus of phase one.
Large-scale data testing	Real IC layout dataset validation and tests across process nodes and design complexity.	Gradually complete in phases one and two.
Performance limit exploration	Upper-bound testing, extreme resolution, and complex layout capability validation.	Main research focus of phase two.
Real-world scenario validation	Industrial testing, EDA tool integration, and interface adaptation.	Main focus of phase two.

Features Needing Improvement

Training infrastructure: configuration management, losses, and optimizer framework are complete; distributed training and automatic hyperparameter tuning remain unfinished.
Performance testing: baseline inference testing with untrained models is complete; post-training accuracy and performance evaluation remain unfinished.
Documentation: technical documentation, user guide, and API notes are complete; full tutorials, best practices, and deployment cases are still needed.

Completed Core Features

Multi-Backbone Support

The project supports VGG16, ResNet34, and EfficientNet-B0 to fit different speed and accuracy requirements.

Feature	Problem solved	Value
Multi-backbone support	Different applications need different speed-accuracy tradeoffs.	ResNet34 supports real-time processing, while VGG16 provides a high-accuracy baseline.
Geometry-aware heads	IC layouts have Manhattan geometry.	Integrates geometric constraints into layout recognition.
Feature Pyramid Network	Different hierarchy levels and process nodes have different sizes.	Supports large layouts up to 4096×4096 pixels.

Diffusion-Based Data Augmentation

IC layout data is scarce and expensive to annotate, and traditional augmentation is limited. The project introduces diffusion-based data augmentation to synthesize layouts with realistic geometric relationships.

Diffusion model integration: DDPM is applied to IC layout augmentation.
Geometric transformation augmentation: supports 8 discrete rotations and mirrors with H-consistency checking.
Multi-source data mixing: configurable ratios of real and synthetic data.

Geometry-Consistency Loss

IC layout training requires special losses and strategies to preserve geometric consistency:

L_{geo} = L_{det} + \lambda_1 L_{desc} + \lambda_2 L_{H\text{-}consistency}

Here, $L_{det}$ is detection loss, $L_{desc}$ is descriptor loss, and $L_{H\text{-}consistency}$ is H-consistency loss.

Advantages:

Complete mathematical framework for the loss function.
Feature consistency under rotation and mirroring.
Manhattan geometry constraints integrated into deep learning loss design.
Improved rotation invariance and geometric robustness.

Configuration-Driven Training

The project supports YAML configuration for hyperparameter management and experiment reproduction.

# configs/base_config.yaml
training:
  learning_rate: 5.0e-5
  batch_size: 8
  num_epochs: 50
  patch_size: 256

model:
  backbone:
    name: "resnet34"
    pretrained: false
  fpn:
    enabled: true
    out_channels: 256

data_sources:
  real:
    enabled: true
    ratio: 0.7
  diffusion:
    enabled: true
    png_dir: "data/diffusion_generated"
    ratio: 0.3

Multi-Scale Template Matching

Existing matching methods struggle with rotation, scale changes, and multi-instance detection in IC layouts. The project implements multi-scale template matching, multi-instance detection, and RANSAC geometric verification.

# Basic matching
python match.py \
    --layout data/large_layout.png \
    --template data/small_template.png \
    --output results/matching.png \
    --json_output results/matching.json

Module	Role
Multi-scale template matching	Uses pyramid search and multi-resolution feature fusion for cross-node matching.
Multi-instance detection	Finds multiple instances using region masking and iterative detection.
RANSAC geometric verification	Filters mismatches and improves matching accuracy.

Innovation Analysis

Geometry-Aware Descriptors

Traditional descriptors such as SIFT and SURF cannot capture Manhattan geometry in IC layouts, including right angles, grids, and sparse regions. The project introduces a geometry-aware feature function:

d_{geo} = F_{geo}(I, H)

where $I$ is the input layout image, $H$ is the geometric transformation matrix, and $F_{geo}$ is the geometry-aware feature extractor.

Advantages:

Manhattan constraints force descriptors to learn right-angle and grid structures.
Rotation invariance is built around 8 geometric transformations.
Potential accuracy improvement of 30% to 50% over traditional methods in IC layout matching.

Rotation-Invariant Loss

IC layouts are frequently rotated during design, and traditional methods cannot guarantee feature consistency after rotation. The project proposes:

L_{rotation} = \sum_{i=1}^{8} \| d(I) - d(T_i(I)) \|^2

where $T_i$ is the $i$ -th geometric transformation, including 4 rotations and 4 mirrors.

Diffusion Data Augmentation

To address IC layout data scarcity, diffusion models are used for layout augmentation:

I_{syn} = D_{\theta}^{-1}(z_T, I_{real})

The generated layouts are conditioned on real layout data and aim to preserve geometric constraints.

Modular Architecture

The project adopts a plugin-like modular design, supporting flexible combinations of backbones and attention mechanisms while reducing integration complexity with EDA tools.

Performance Testing And Analysis

Test Environment

Item	Configuration
Hardware	Intel Xeon 8558P + NVIDIA A100 × 1 + 512GB memory
Software	PyTorch 2.6+, CUDA 12.8
Test type	Forward inference test with untrained model
Test data	Random 2048×2048 layout-like simulation data

GPU Inference Performance

Rank	Backbone	Attention	Single-scale inference (ms)	FPN inference (ms)	FPS	Performance
1	ResNet34	None	18.10 ± 0.07	21.41 ± 0.07	55.3	Best
2	ResNet34	SE	18.14 ± 0.05	21.53 ± 0.06	55.1	Excellent
3	ResNet34	CBAM	18.23 ± 0.05	21.50 ± 0.07	54.9	Excellent
4	EfficientNet-B0	None	21.40 ± 0.13	33.48 ± 0.42	46.7	Good
5	EfficientNet-B0	CBAM	21.55 ± 0.05	33.33 ± 0.38	46.4	Good
6	EfficientNet-B0	SE	21.67 ± 0.30	33.52 ± 0.33	46.1	Good
7	VGG16	None	49.27 ± 0.23	102.08 ± 0.42	20.3	Average
8	VGG16	SE	49.53 ± 0.14	101.71 ± 1.10	20.2	Average
9	VGG16	CBAM	50.36 ± 0.42	102.47 ± 1.52	19.9	Average

CPU vs GPU Speedup

Backbone	Attention	CPU inference (ms)	GPU inference (ms)	Speedup	Rating
ResNet34	None	171.73	18.10	9.5×	Efficient
ResNet34	CBAM	406.07	18.23	22.3×	Excellent
ResNet34	SE	419.52	18.14	23.1×	Excellent
VGG16	None	514.94	49.27	10.4×	Efficient
VGG16	SE	808.86	49.53	16.3×	Very good
VGG16	CBAM	809.15	50.36	16.1×	Very good
EfficientNet-B0	None	1820.03	21.40	85.1×	Outstanding
EfficientNet-B0	SE	1815.73	21.67	83.8×	Outstanding
EfficientNet-B0	CBAM	1954.59	21.55	90.7×	Outstanding

Performance Conclusion

GPU acceleration is significant. The average speedup is 39.7×, the maximum speedup is 90.7× with EfficientNet-B0 + CBAM, and the minimum speedup is 9.5× with ResNet34 + None.

Recommendations

Best default configuration: ResNet34 + no attention, 18.1ms single-scale inference and about 55.3 FPS.
High-accuracy matching: ResNet34 + SE, with little inference overhead.
Multi-scale search: FPN improves multi-scale capability but adds compute overhead.
Batch processing: A100 can support 8 to 16 concurrent inference jobs.

Future Work Plan

Phase 1: Minimum Deliverable Standard (2025.11 - 2026.01)

Direction	Work
Data preparation	Collect IC layout data from Prof. Zheng’s company, clean data, convert formats, and perform quality control.
Basic training	Train the ResNet34 backbone, validate basic geometry-consistency loss, and tune simple hyperparameters.
Functional validation	Complete end-to-end tests and basic performance benchmarks.

Phase 2: Paper-Level Research (2026.01 - 2026.04)

This phase will use advanced-node data from Prof. Chen to support high-quality research papers and patent applications.

Direction	Work
Advanced-node adaptation	Analyze advanced-process layout features and optimize geometric matching at very small scales.
Algorithmic innovation	Study mathematical modeling for more complex geometric transformations and multimodal layout information fusion.
Academic output	Target ICCAD, DAC, ASP-DAC, DATE, or TCAD.

Risk Assessment

Risk	Probability	Impact	Mitigation
Difficult model convergence	Medium	High	Adjust learning rate and increase data augmentation.
Insufficient training data	Medium	High	Use diffusion data augmentation and synthetic data.
Performance below target	Low	Medium	Compare multiple backbones and optimize architecture.
Overfitting	Medium	Medium	Use early stopping and regularization.

Risk Strategy

Project risk will be reduced through modular design, staged execution, and parallel backup solutions, ensuring that core goals can be completed on time.

Summary And Outlook

Core results:

Technical breakthrough: completed the RoRD model architecture and implemented rotation-robust IC layout descriptors.
Performance improvement: inference speed reaches 55.3 FPS, with an average GPU speedup of 39.7×.
Application value: supports high-accuracy multi-scale and multi-instance layout matching.
Innovations: geometry-aware loss, diffusion data augmentation, and modular architecture design.

RoRD Midterm Review: Rotation-Robust Descriptors For IC Layout Recognition

https://www.jiao77.com/en/blog/report/rord-midterm-review-2025-11/

Author

Jiao77

Published on

Nov 1, 2025

License

CC BY-NC-SA 4.0

Loading comments...

Contents

Project Overview
Background And Goals
Key Problems
Limitations Of Traditional Methods
Technical Advantages
Completion Analysis
Key Remaining Tasks
Features Needing Improvement
Completed Core Features
Multi-Backbone Support
Diffusion-Based Data Augmentation
Geometry-Consistency Loss
Configuration-Driven Training
Multi-Scale Template Matching
Innovation Analysis
Geometry-Aware Descriptors
Rotation-Invariant Loss
Diffusion Data Augmentation
Modular Architecture
Performance Testing And Analysis
Test Environment
GPU Inference Performance
CPU vs GPU Speedup
Recommendations
Future Work Plan
Phase 1: Minimum Deliverable Standard (2025.11 - 2026.01)
Phase 2: Paper-Level Research (2026.01 - 2026.04)
Risk Assessment
Summary And Outlook