MetaFormer Logo

MetaFormer

MetaFormer Logo

High-fidelity Metalens Imaging via Aberration Correcting Transformers

Byeonghyeon Lee1, Youbin Kim1, Yongjae Jo1, Hyunsu Kim1, Hyemi Park1, Yangkyu Kim1, Debabrata Mandal2,
Praneeth Chakravarthula2, Inki Kim1, and Eunbyung Park1
1Sungkyunkwan University      2University of North Carolina at Chapel Hill

Compact and Tiny Lens, Metalens

Video Aberration Correction

3D Reconstruction with Aberrated Images

Image Aberration Correction

Abstract

Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise of various applications such as medical imaging and augmented/virtual reality (AR/VR). Despite its advantage in miniaturization, its practicality is constrained by severe aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address different types of aberrations, yet most of them are mainly designed for the traditional bulky lens and not convincing enough to remedy harsh aberrations of the metalens. While there have existed aberration correction methods specifically for metalens, they still fall short of restoration quality. In this work, we propose MetaFormer, an aberration correction framework for metalens-captured images, harnessing Vision Transformers (ViT) that has shown remarkable restoration performance in diverse image restoration tasks. Specifically, we devise a Multiple Adaptive Filters Guidance (MAFG), where multiple Wiener filters enrich the degraded input images with various noise-detail balances, enhancing output restoration quality. In addition, we introduce a Spatial and Transposed self-Attention Fusion (STAF) module, which aggregates features from spatial self-attention and transposed self-attention modules to further ameliorate aberration correction. We conduct extensive experiments, including correcting aberrated images and videos, and clean 3D reconstruction from the degraded images. The proposed method outperforms the previous arts by a significant margin. We further fabricate a metalens and verify the practicality of MetaFormer by restoring the images captured with the manufactured metalens in the wild.

Metalens Fabrication

Metalens Figure Fabrication Figure

We fabricated a metalens with the PSF from Neural Nano optics. A metalens with a diameter of 500 µm and a focal length of 1 mm was designed based on the optimization of a polynomial phase equation. The SiN meta-atom library was generated using rigorous coupled-wave analysis simulations for circular pillars with a height of 750 nm. The widths of the selected meta-atoms ranged from 100 to 300 nm, with a lattice period of 350 nm.
A 750 nm thick SiN layer was deposited onto a SiO2 substrate using plasma-enhanced chemical vapor deposition to fabricate the designed metalens. A 200 nm thick positive photoresist layer was spin-coated at 4000 RPM. The pattern of circular nano-pillar meta-atoms was then transferred onto the positive photoresist using electron beam lithography (Figure right (a)), with a dose of 3.75 C/m2. To prevent charging, 100 µL of ESPACER was spin-coated at 2000 RPM for 30 seconds.
The exposed resist was developed in a 1:3 solution of methyl isobutyl ketone/isopropyl alcohol for 11 minutes. Subsequently, a 40 nm thick chromium layer was deposited as a hard mask using an electron beam evaporator (Figure right (b)). The unexposed photoresist was removed through a lift-off process in acetone at room temperature for 1 hour, leaving the Cr hard mask intact. Patterning was finalized using inductively coupled plasma etching with SF6 and C4H8 gases for 10 minutes. Finally, the Cr hard mask was removed using a chromium etchant for 5 minutes. The fabricated metalens is shown on the left-side of the figure.

The above figure illustrates the image capture setup. An optical microscope system was set up to obtain images through the metalens. The images displayed on a 5.5-inch FHD display were captured using a CMOS camera coupled with a magnification system consisting of a 20x objective lens with 0.5 NA and a tube lens. The metalens was positioned such that its focal plane coincided with the focal plane of the objective lens using a linear motorized stage. Camera exposure time was adjusted using a white image prior to recording to prevent saturation. The point spread functions (PSFs) were then acquired using the same setup with 450 nm laser, 532 nm laser, and 635 nm laser for calibration and training of the model. The following images show the restoration results of MetaFormer on the real images captured with the fabricated metalens under this setup.

Real Image Aberration Correction

Methods

MetaFormer comprises Multiple Adaptive Filters Guidance (MAFG) which produces different representations with various noise-detail balances, and a Spatial and Transposed self-Attention Fusion (STAF) module that aggregates features differently in encoder and decoder.


Multiple Adaptive Filters Guidance (MAFG)

We propose to use multiple Wiener filters to guide aberration correction with several distinct representations. It is not feasible to obtain an optimal Wiener filter with accurate SNR as noise distribution is unknown in the real world. Instead of estimating the noise distribution, we adopt multiple Wiener filters with different \( K \). We use \( M \) Wiener filters and deconvolve the input image to yield \( M \) different representations—some focused on noise removal, others with fine information. Various representations are fed to the restoration model, and they can enrich the features complementarily, which in turn improves aberration correction.
We extend multiple Wiener filters to Multiple Adaptive Filters Guidance (MAFG), which determines \( K \) adaptively considering the image intensity. Image with higher intensity tends to have better signal quality, so brighter channels often experience less noise and are less sensitive to noise. Thus, we penalize noise less and capture more information for bright images by adjusting \( K \) with the image intensity. Also, we treat each channel differently to avoid suppressing high-SNR details unnecessarily.


Spatial and Transposed self-Attention Fusion (STAF)

We propose a Spatial and Transposed self-Attention Fusion (STAF) module to further ameliorate image restoration. By leveraging both Spatial Attention (SA) and Transposed Attention (TA), the STAF module can capture diverse spatial dependencies. To fully realize the potential of SA and TA in image restoration, it is important to consider the distinct roles of the encoder and decoder in Transformers. The encoder focuses on capturing global context, emphasizing the overall structure and relationships within images—a critical aspect of identifying patterns and features corrupted in degraded images. Meanwhile, the decoder specializes in recovering fine local details and textures necessary for high-fidelity restoration. Therefore, STAF module applies SA and TA separately rather than alternately, as illustrated in Figure (a). Instead, as shown in Figure (b), STAF module fuses their features by assigning different weights in the encoder and decoder stages.

Quantitative Results

Image Aberration Correction

Open Image V7
Method PSNR↑ SSIM↑ LPIPS↓
Wiener deconvolution 25.08 0.6433 0.5164
Eboli et al. 16.45 0.3602 0.8929
DWDN 25.77 0.7320 0.3333
Tseng et al. 27.56 0.7612 0.3374
Restormer 28.92 0.7719 0.3039
Ours 32.16 0.8159 0.2810

Video Aberration Correction

DVD
Method PSNR↑ SSIM↑ LPIPS↓
VRT 16.38 0.4876 0.8058
VRT + MAFG 25.55 0.7795 0.2347
Restormer 27.47 0.7463 0.2997
Ours 27.91 0.7798 0.2892
VRT w/ Ours 28.00 0.8331 0.2408

3D Reconstruction with Aberrated Images

LLFF Tanks&Temples Mip-NeRF360
Method PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
3D-GS 15.34 0.3037 0.8706 14.85 0.2885 0.8641 17.09 0.3270 0.8892
3D-GS + Ours 24.37 0.7009 0.2867 22.40 0.6427 0.3113 25.66 0.6395 0.4201

To evaluate the performance of aberration correction tasks across image, video, and 3D reconstruction domains, we synthesized aberration on existing datasets by applying point spread function to construct aberrated datasets. These tables present the quantitative results of various methods for each task, demonstrating the superior performance of our model.

BibTeX

@article{lee2024metaformer,
  title={MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers},
  author={Lee, Byeonghyeon and Kim, Youbin and Jo, Yongjae and Kim, Hyunsu and Park, Hyemi and Kim, Yangkyu and Mandal, Debabrata and Chakravarthula, Praneeth and Kim, Inki and Park, Eunbyung},
  journal={arXiv preprint arXiv:2412.04591},
  year={2024}
}