MetaFormer

Compact and Tiny Lens, Metalens

Abstract

Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise of various applications such as medical imaging and augmented/virtual reality (AR/VR). Despite its advantage in miniaturization, its practicality is constrained by severe aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address different types of aberrations, yet most of them are mainly designed for the traditional bulky lens and not convincing enough to remedy harsh aberrations of the metalens. While there have existed aberration correction methods specifically for metalens, they still fall short of restoration quality. In this work, we propose MetaFormer, an aberration correction framework for metalens-captured images, harnessing Vision Transformers (ViT) that has shown remarkable restoration performance in diverse image restoration tasks. Specifically, we devise a Multiple Adaptive Filters Guidance (MAFG), where multiple Wiener filters enrich the degraded input images with various noise-detail balances, enhancing output restoration quality. In addition, we introduce a Spatial and Transposed self-Attention Fusion (STAF) module, which aggregates features from spatial self-attention and transposed self-attention modules to further ameliorate aberration correction. We conduct extensive experiments, including correcting aberrated images and videos, and clean 3D reconstruction from the degraded images. The proposed method outperforms the previous arts by a significant margin. We further fabricate a metalens and verify the practicality of MetaFormer by restoring the images captured with the manufactured metalens in the wild.

Metalens Fabrication

We fabricated a metalens with the PSF from Neural Nano optics. A metalens with a diameter of 500 µm and a focal length of 1 mm was designed based on the optimization of a polynomial phase equation. The SiN meta-atom library was generated using rigorous coupled-wave analysis simulations for circular pillars with a height of 750 nm. The widths of the selected meta-atoms ranged from 100 to 300 nm, with a lattice period of 350 nm.
A 750 nm thick SiN layer was deposited onto a SiO₂ substrate using plasma-enhanced chemical vapor deposition to fabricate the designed metalens. A 200 nm thick positive photoresist layer was spin-coated at 4000 RPM. The pattern of circular nano-pillar meta-atoms was then transferred onto the positive photoresist using electron beam lithography (Figure right (a)), with a dose of 3.75 C/m². To prevent charging, 100 µL of ESPACER was spin-coated at 2000 RPM for 30 seconds.
The exposed resist was developed in a 1:3 solution of methyl isobutyl ketone/isopropyl alcohol for 11 minutes. Subsequently, a 40 nm thick chromium layer was deposited as a hard mask using an electron beam evaporator (Figure right (b)). The unexposed photoresist was removed through a lift-off process in acetone at room temperature for 1 hour, leaving the Cr hard mask intact. Patterning was finalized using inductively coupled plasma etching with SF₆ and C₄H₈ gases for 10 minutes. Finally, the Cr hard mask was removed using a chromium etchant for 5 minutes. The fabricated metalens is shown on the left-side of the figure.

The above figure illustrates the image capture setup. An optical microscope system was set up to obtain images through the metalens. The images displayed on a 5.5-inch FHD display were captured using a CMOS camera coupled with a magnification system consisting of a 20x objective lens with 0.5 NA and a tube lens. The metalens was positioned such that its focal plane coincided with the focal plane of the objective lens using a linear motorized stage. Camera exposure time was adjusted using a white image prior to recording to prevent saturation. The point spread functions (PSFs) were then acquired using the same setup with 450 nm laser, 532 nm laser, and 635 nm laser for calibration and training of the model. The following images show the restoration results of MetaFormer on the real images captured with the fabricated metalens under this setup.

Methods

MetaFormer comprises Multiple Adaptive Filters Guidance (MAFG) which produces different representations with various noise-detail balances, and a Spatial and Transposed self-Attention Fusion (STAF) module that aggregates features differently in encoder and decoder.

Multiple Adaptive Filters Guidance (MAFG)

We propose to use multiple Wiener filters to guide aberration correction with several distinct representations. It is not feasible to obtain an optimal Wiener filter with accurate SNR as noise distribution is unknown in the real world. Instead of estimating the noise distribution, we adopt multiple Wiener filters with different \( K \). We use \( M \) Wiener filters and deconvolve the input image to yield \( M \) different representations—some focused on noise removal, others with fine information. Various representations are fed to the restoration model, and they can enrich the features complementarily, which in turn improves aberration correction.
We extend multiple Wiener filters to Multiple Adaptive Filters Guidance (MAFG), which determines \( K \) adaptively considering the image intensity. Image with higher intensity tends to have better signal quality, so brighter channels often experience less noise and are less sensitive to noise. Thus, we penalize noise less and capture more information for bright images by adjusting \( K \) with the image intensity. Also, we treat each channel differently to avoid suppressing high-SNR details unnecessarily.

Spatial and Transposed self-Attention Fusion (STAF)

We propose a Spatial and Transposed self-Attention Fusion (STAF) module to further ameliorate image restoration. By leveraging both Spatial Attention (SA) and Transposed Attention (TA), the STAF module can capture diverse spatial dependencies. To fully realize the potential of SA and TA in image restoration, it is important to consider the distinct roles of the encoder and decoder in Transformers. The encoder focuses on capturing global context, emphasizing the overall structure and relationships within images—a critical aspect of identifying patterns and features corrupted in degraded images. Meanwhile, the decoder specializes in recovering fine local details and textures necessary for high-fidelity restoration. Therefore, STAF module applies SA and TA separately rather than alternately, as illustrated in Figure (a). Instead, as shown in Figure (b), STAF module fuses their features by assigning different weights in the encoder and decoder stages.

Quantitative Results

Image Aberration Correction

	Open Image V7
Method	PSNR↑	SSIM↑	LPIPS↓
Wiener deconvolution	25.08	0.6433	0.5164
Eboli et al.	16.45	0.3602	0.8929
DWDN	25.77	0.7320	0.3333
Tseng et al.	27.56	0.7612	0.3374
Restormer	28.92	0.7719	0.3039
Ours	32.16	0.8159	0.2810

Video Aberration Correction

	DVD
Method	PSNR↑	SSIM↑	LPIPS↓
VRT	16.38	0.4876	0.8058
VRT + MAFG	25.55	0.7795	0.2347
Restormer	27.47	0.7463	0.2997
Ours	27.91	0.7798	0.2892
VRT w/ Ours	28.00	0.8331	0.2408

3D Reconstruction with Aberrated Images

	LLFF			Tanks&Temples			Mip-NeRF360
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
3D-GS	15.34	0.3037	0.8706	14.85	0.2885	0.8641	17.09	0.3270	0.8892
3D-GS + Ours	24.37	0.7009	0.2867	22.40	0.6427	0.3113	25.66	0.6395	0.4201

To evaluate the performance of aberration correction tasks across image, video, and 3D reconstruction domains, we synthesized aberration on existing datasets by applying point spread function to construct aberrated datasets. These tables present the quantitative results of various methods for each task, demonstrating the superior performance of our model.

BibTeX

@article{lee2024metaformer,
  title={MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers},
  author={Lee, Byeonghyeon and Kim, Youbin and Jo, Yongjae and Kim, Hyunsu and Park, Hyemi and Kim, Yangkyu and Mandal, Debabrata and Chakravarthula, Praneeth and Kim, Inki and Park, Eunbyung},
  journal={arXiv preprint arXiv:2412.04591},
  year={2024}
}