RoRD Graduation Project: Structure-Aware Local Feature Matching for IC Layouts

Apr 29, 2026

1370 words

7 min read

RoRD

RoRD Graduation Project: Structure-Aware Local Feature Matching for IC Layouts

This post summarizes my graduation project on applying RoRD-inspired local feature matching to IC layout pattern retrieval. Compared with the earlier progress reports, this version is more complete: it connects the problem definition, method design, experimental protocol, baseline comparison, and engineering limitations in one place.

Short Version

The goal is to retrieve all template-equivalent instances from a large IC layout without explicitly enumerating rotated and mirrored template variants. The final pipeline encodes layout structure into both the input representation and the learned local features, then combines sliding-window layout extraction, RANSAC verification, region masking, and iterative multi-instance search.

Motivation

IC layout pattern matching aims to localize structures in a large layout that are identical or structurally equivalent to a given template. It is useful for design reuse validation, similar-structure retrieval, anomaly inspection, IP protection, and layout review.

A direct difficulty is geometric variation. Target instances often appear under rotation, reflection, and translation. If four rotations and mirroring are considered, a conventional pipeline may need to search nearly eight transformed versions of the same template, which increases computation and complicates the retrieval process.

IC layouts also differ sharply from natural images. They are usually binarized, weakly textured, highly geometric, and densely repetitive. Their semantics are expressed more by line segments, corners, endpoints, repeated cells, and topology than by texture. This makes direct transfer of natural-image local features unreliable.

The central question of this work is:

Can a model learn a stable structural representation under rotation and reflection, and can that representation be used for large-layout, multi-instance retrieval?

Pipeline

The method separates two responsibilities. The model learns which locations are worth matching and how to describe them; the retrieval pipeline decides how to find all instances across a large layout.

Overall matching pipeline

Given a template image TT and a large layout image LL, the system outputs a set of matched boxes:

R={(bi,ci,gi)}i=1N\mathcal{R} = \{(b_i, c_i, g_i)\}_{i=1}^{N}

Here, bib_i is the matched box, cic_i is the confidence score, and gig_i records geometric information such as scale, rotation, or reflection.

Four-Channel Structural Input

Instead of using only a single binary layout image, the method builds a four-channel representation:

ChannelPurpose
Raw binary layoutPreserves the overall shape
SkeletonEncodes connectivity and main structure
CornersHighlights local turning points
EndpointsHighlights structure boundaries

Four-channel layout representation

The key idea is that the most discriminative cues in IC layouts are not textures, but corners, endpoints, skeletons, and their combinations. The four-channel input makes these structural priors directly visible to the network.

Joint Keypoint and Descriptor Learning

The model uses a shared backbone with two heads:

  1. A detection head predicts a keypoint response map.
  2. A descriptor head predicts local descriptors for matching.

Training uses self-supervised geometric consistency. Layout patches are transformed by rotation, reflection, scale perturbation, and photometric noise while the transformation relation is kept known. This provides positive and negative pairs without manual keypoint labels.

The total loss is:

L=Ldet+λLdesc\mathcal{L} = \mathcal{L}_{det} + \lambda \mathcal{L}_{desc}

The detection loss emphasizes both repeatability and structural usefulness:

Ldet=Lconsistency+αLstructure\mathcal{L}_{det} = \mathcal{L}_{consistency} + \alpha \mathcal{L}_{structure}
TermRole
Lconsistency\mathcal{L}_{consistency}Ensures the same structure remains detectable after rotation, reflection, or translation
Lstructure\mathcal{L}_{structure}Encourages responses around corners, endpoints, junctions, and other useful structural regions

The descriptor loss pulls corresponding structures together and pushes non-corresponding structures apart:

Ldesc=(p,p+)pos(D(p),D(p+))+(p,p)neg(D(p),D(p))\mathcal{L}_{desc} = \sum_{(p,p^+)} \ell_{pos}(D(p), D'(p^+)) + \sum_{(p,p^-)} \ell_{neg}(D(p), D'(p^-))

In layout images, negative samples are often not visually unrelated regions. They may be locally similar but geometrically incorrect repeated structures. This makes descriptor learning especially important for suppressing false correspondences.

Model and Inference Settings

The main experiment uses a VGG16 + FPN architecture:

ItemSetting
InputFour-channel structural image
BackboneVGG16 with a four-channel first convolution
FPN levelsP2 / P3 / P4
FPN channels256
Descriptor dimension128
Attention moduleDisabled in the main experiment

Training settings:

ItemSetting
Patch size256×256256 \times 256
Batch size8
Learning rate5×1055\times10^{-5}
Epochs50
Scale jitter[0.8,1.2][0.8, 1.2]
AugmentationBrightness/contrast perturbation and Gaussian noise
Training dataReal ICCAD2019 samples

Inference settings:

ItemSetting
Window size1024
Stride768
Matching scales{0.75,1.0,1.5}\{0.75, 1.0, 1.5\}
Keypoint threshold0.5
RANSAC reprojection threshold5.0 pixels
Minimum inlier threshold15

Large-Layout Multi-Instance Retrieval

The retrieval pipeline does not try to solve the whole image in one global step. Instead, it verifies one instance, masks it, and continues searching:

1. Extract template features.
2. Extract layout features with sliding windows.
3. Select candidate keypoints from unmasked regions.
4. Match template descriptors against layout descriptors.
5. Run RANSAC for geometric verification.
6. If the inlier count passes the threshold, project and record the template box.
7. Mask the matched region and continue searching.

This is well suited to densely repetitive layout scenes. Candidate keypoints are kept with high coverage, while RANSAC handles the main filtering burden later in the pipeline.

Dataset and Metrics

The test layout is an 18×1818 \times 18 composed layout containing nine ground-truth targets, including the original template state, discrete rotations, and mirrored variants.

Large reused layout example Template image

Metrics:

MetricMeaning
Center-hit PrecisionFraction of predicted boxes whose centers fall inside ground-truth boxes
Center-hit RecallFraction of ground-truth targets hit by predicted centers
IoU Recall@0.1Fraction of ground-truth targets covered by predictions with IoU above 0.1
Layout Feature TimeTime spent extracting large-layout features
Match Pipeline TimeTime spent on matching and geometric verification
Scope

This dataset is useful for testing whether the method can cover rotated, mirrored, and multi-instance targets without explicit template-state enumeration. It is still a limited and controlled benchmark, so it should not be read as a full generalization proof for all industrial layout scenarios.

Main Results

On the current benchmark, the proposed method retrieves all nine targets:

MethodPredictionsCenter PrecisionCenter RecallIoU Recall@0.1
Proposed method91.01.01.0

The keypoint responses also match the design intention: they concentrate around corners, endpoints, and complex structural junctions instead of spreading uniformly over background regions.

Keypoint response visualization Large-layout keypoint distribution

Intermediate matching results:

Raw matches RANSAC inlier matches Iterative matching after masking Final matching result

Profiling shows that the main bottleneck is large-layout feature extraction:

StageTime
Layout feature extraction25563.531 ms
Matching stage1256.519 ms

This suggests that the next major optimization target is not RANSAC or the matching loop, but the first forward pass over the large layout.

Baseline Comparison

The experiment compares the method with ORB, SIFT, SuperPoint, and SuperPoint + LightGlue. To make the task comparable, the baselines are wrapped with the same sliding-window, masking, and iterative-search scaffold.

MethodPredictionsCenter PrecisionCenter RecallIoU Recall@0.1
Proposed method91.01.01.0
ORB51.00.55560.5556
SIFT60.83330.55560.5556
SuperPoint00.00.00.0
SuperPoint + LightGlue120.00.00.0
ORB matching result SIFT matching result SuperPoint matching result LightGlue matching result

Timing comparison:

MethodMain pipeline time
Proposed method: layout feature extraction25563.531 ms
Proposed method: match pipeline1256.519 ms
ORB: sliding window + iteration9185.479 ms
SIFT: sliding window + iteration56827.718 ms
SuperPoint: sliding window + iteration2440.836 ms

Several observations stand out:

  1. ORB remains partially effective on binary Manhattan-style layouts, but it struggles with mirrored targets and dense repeated structures.
  2. SIFT hits part of the targets, but it does not show its usual natural-image advantage in this weak-texture setting and is slower.
  3. SuperPoint, pretrained mainly for natural images, does not adapt well to this layout distribution under the current settings.
  4. LightGlue depends on the quality of the input correspondences; a stronger matcher cannot recover correct geometry if the front-end features are not layout-aware.
  5. The proposed method is not the fastest in every stage, but it gives a stronger overall tradeoff in task completion, target coverage, rotation/reflection adaptation, and reduced template enumeration.

Limitations

This work does not solve every layout retrieval problem. It is best suited for layouts with clear local structures, explicit topology, and reusable cells. The following cases still need more validation:

  1. Real industrial layouts across process nodes, design styles, and resolutions.
  2. Very small templates with too few structural cues.
  3. Extremely repetitive regions with many competing similar candidates.
  4. Scale gaps beyond the current multi-scale search range.
  5. More rigorous ablation studies on rotation and reflection invariance.

From an engineering perspective, the current version is better positioned as an offline layout analysis, batch retrieval, or review-assistance module than as a low-latency interactive tool. Sliding-window feature extraction remains the dominant bottleneck.

Next Steps

The graduation project establishes a complete working loop: structural representation, local feature learning, large-layout retrieval, geometric verification, iterative multi-instance search, and baseline comparison. The next useful directions are:

  1. Accelerate sliding-window layout feature extraction and reduce redundant computation across overlapping windows.
  2. Expand the real test set across more process nodes, design styles, and layout densities.
  3. Add systematic ablations for the four-channel input, structural loss, FPN, and iterative masking.
  4. Compare against more industrially relevant traditional layout retrieval methods.
  5. Explore vector-layout inputs to reduce rasterization and convolution cost while preserving structural tolerance.

Overall, this project shows that RoRD-style layout recognition is not just a matter of moving natural-image matching into IC layouts. The structural priors, geometric constraints, and large-layout retrieval pipeline have to be designed together. That is why this direction still feels worth digging into.

RoRD Graduation Project: Structure-Aware Local Feature Matching for IC Layouts
https://www.jiao77.com/en/blog/report/rord-graduation-thesis-2026-04-29/
Author
Jiao77
Published on
Apr 29, 2026
License
CC BY-NC-SA 4.0

Loading comments...

Enter keywords to start searching