CVPR'22 最新132篇論文分方向整理｜包含目標檢測、圖像處理、醫學影像等28個方向

本文轉載自查看原文 2022-03-29 10:18 1022 CVPR

https://mp.weixin.qq.com/s/5h64fbTfVSr4wuuEKL2jeg

本文首發極市平台公眾號，轉載請獲得授權並標明出處。CVPR 2022 已經放榜，本次一共有2067篇論文被接收，接收論文數量相比去年增長了24%。在CVPR2022正式會議召開前，為了讓大家更快地獲取和學習到計算機視覺前沿技術，極市對CVPR022 最新論文進行追蹤，包括分研究方向的論文、代碼匯總以及論文技術直播分享。CVPR 2022 論文分方向整理目前在極市社區持續更新中，已累計更新了386篇，項目地址：https://bbs.cvmart.net/articles/6124以下是本周更新的 CVPR 2022 論文，包含包含目標檢測、圖像處理、三維視覺、醫學影像、動作識別、人臉、文本檢測、目標跟蹤、神經網絡架構設計等方向。點擊 閱讀原文 即可打包下載。

- 檢測
  - 2D目標檢測
  - 3D目標檢測
  - 車道線檢測
  - 異常檢測
- 分割
  - 語義分割
  - 實例分割
  - 全景分割
  - 密集預測
- 估計
  - 位姿估計
  - 光流估計
  - 深度估計
  - 人體姿態估計
- 圖像、視頻檢索與理解
  - 動作識別
  - 行人重識別
  - 圖像字幕
- 醫學影像
- 文本檢測與識別
- 目標跟蹤
- 人臉
  - 人臉編輯
  - 人臉偽造
  - 表情識別
- 圖像處理
  - 圖像復原/圖像重建
  - 超分辨率
  - 圖像去噪/去雨
  - 風格遷移
  - 圖像翻譯
- 三維視覺
  - 點雲
  - 三維重建
  - 場景重建/視圖合成
- 視頻處理
  - 視頻編輯
- 場景圖生成
- 遷移學習/domain
- 對抗式
- 數據集
- 數據處理
  - 圖像壓縮
  - 歸一化
- 視覺表征學習
- 神經網絡結構設計
  - CNN
  - Transformer
  - 神經網絡架構搜索
- 模型訓練/泛化
  - 噪聲標簽
- 小樣本學習
- 度量學習
- 持續學習
- 聯邦學習
- 元學習
- 強化學習

檢測

2D目標檢測

[1] Semantic-aligned Fusion Transformer for One-shot Object Detection(用於一次性目標檢測的語義對齊融合transformer)
paper：https://arxiv.org/abs/2203.09093[2] A Dual Weighting Label Assignment Scheme for Object Detection(一種用於目標檢測的雙重加權標簽分配方案)
paper：https://arxiv.org/abs/2203.09730
code：https://github.com/strongwolf/DW[3] Confidence Propagation Cluster: Unleash Full Potential of Object Detectors(信心傳播集群：釋放目標檢測器的全部潛力)
paper：https://arxiv.org/abs/2112.00342
[4] Oriented RepPoints for Aerial Object Detection(面向空中目標檢測的 RepPoints)(小目標檢測)
paper：https://arxiv.org/abs/2105.11111
code：https://github.com/LiWentomng/OrientedRepPoints[5] Real-time Object Detection for Streaming Perception(用於流感知的實時對象檢測)
paper：https://arxiv.org/abs/2203.12338
code：https://github.com/yancie-yjr/StreamYOLO[6] Progressive End-to-End Object Detection in Crowded Scenes(擁擠場景中的漸進式端到端對象檢測)
paper：https://arxiv.org/abs/2203.07669
code：https://github.com/megvii-model/Iter-E2EDET[7] QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection(用於加速高分辨率小目標檢測的級聯稀疏查詢)(小目標檢測)
paper：https://arxiv.org/abs/2103.09136
code：https://github.com/ChenhongyiYang/QueryDet-PyTorch[8] End-to-End Human-Gaze-Target Detection with Transformers(使用 Transformer 進行端到端的人眼目標檢測)
paper：https://arxiv.org/abs/2203.10433

3D目標檢測

[1] Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds(從點雲進行 3D 對象檢測的 Set-to-Set 方法)
paper：https://arxiv.org/abs/2203.10314
code：https://github.com/skyhehe123/VoxSeT[2] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
paper：https://arxiv.org/abs/2203.09704
code：https://github.com/Gorilla-Lab-SCUT/VISTA[3] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer(使用深度感知 Transformer 的單目 3D 對象檢測)
paper：https://arxiv.org/abs/2203.10981
code：https://github.com/kuanchihhuang/MonoDTR[4] Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion(邁向具有深度完成的高質量 3D 檢測)
paper：https://arxiv.org/abs/2203.09780[5] Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds(學習用於 3D LiDAR 點雲的高效基於點的檢測器)
paper：https://arxiv.org/abs/2203.11139
code：https://github.com/yifanzhang713/IA-SSD[6] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers(用於 3D 對象檢測的穩健 LiDAR-Camera Fusion 與 Transformer)
paper：https://arxiv.org/abs/2203.11496
code：https://github.com/XuyangBai/TransFusion

車道線檢測

[1] CLRNet: Cross Layer Refinement Network for Lane Detection(用於車道檢測的跨層細化網絡)
paper：https://arxiv.org/abs/2203.10350

異常檢測

[1] ViM: Out-Of-Distribution with Virtual-logit Matching(具有虛擬 logit 匹配的分布外)(OOD檢測)
paper：https://arxiv.org/abs/2203.10807
code：https://github.com/haoqiwang/vim[2] UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection(監督開放集視頻異常檢測的新基准)
paper：https://arxiv.org/abs/2111.08644
code：https://github.com/lilygeorgescu/UBnormal

分割

語義分割

[1] Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation(走向稀疏注釋的語義分割)
paper：https://arxiv.org/abs/2203.10739
code：https://github.com/megviiresearch/TEL[2] Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation(弱監督語義分割的區域語義對比和聚合)
paper：https://arxiv.org/abs/2203.09653
code：https://github.com/maeve07/RCA.git[3] Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation(用於域自適應語義分割的類平衡像素級自標記)
paper：https://arxiv.org/abs/2203.09744
code：https://github.com/lslrh/CPSL[4] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation(半監督語義分割的擾動和嚴格均值)
paper：https://arxiv.org/abs/2111.12903

實例分割

[1] Discovering Objects that Can Move(發現可以移動的物體)
paper：https://arxiv.org/abs/2203.10159
code：https://github.com/zpbao/Discovery_Obj_Move/[2] ContrastMask: Contrastive Learning to Segment Every Thing(對比學習分割每件事)
paper：https://arxiv.org/abs/2203.09775[3] Mask Transfiner for High-Quality Instance Segmentation(用於高質量實例分割的 Mask Transfiner)
paper：https://arxiv.org/abs/2111.13673
code：https://github.com/SysCV/transfiner[4] Sparse Instance Activation for Real-Time Instance Segmentation(實時實例分割的稀疏實例激活)
paper：https://arxiv.org/abs/2203.12827
code：https://github.com/hustvl/SparseInst

全景分割

[1] Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers(使用 Transformers 深入研究全景分割)
paper：https://arxiv.org/abs/2109.03814
code：https://github.com/zhiqi-li/Panoptic-SegFormer

密集預測

[1] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting(具有上下文感知提示的語言引導密集預測)
paper：https://arxiv.org/abs/2112.01518
code：https://github.com/raoyongming/DenseCLIP

估計

位姿估計

[1] DiffPoseNet: Direct Differentiable Camera Pose Estimation(直接可微分相機位姿估計)
paper：https://arxiv.org/abs/2203.11174[1] RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization(具有魯棒對應場估計和位姿優化的遞歸 6-DoF 對象位姿細化)
paper：https://arxiv.org/abs/2203.12870
code：https://github.com/DecaYale/RNNPose[2] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation(用於單目物體位姿估計的廣義端到端概率透視-n-點)
paper：https://arxiv.org/abs/2203.13254

光流估計

[1] Global Matching with Overlapping Attention for Optical Flow Estimation(具有重疊注意力的全局匹配光流估計)
paper：https://arxiv.org/abs/2203.11335
code：https://github.com/xiaofeng94/GMFlowNet

深度估計

[1] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation(基於自適應相關的級聯循環網絡的實用立體匹配)
paper：https://arxiv.org/abs/2203.11483
project：https://github.com/megvii-research/CREStereo[2] Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective(從特征一致性的角度重新審視域廣義立體匹配網絡)
paper：https://arxiv.org/abs/2203.10887[3] Deep Depth from Focus with Differential Focus Volume(具有不同焦點體積的焦點深度)
paper：https://arxiv.org/abs/2112.01712[4] RGB-Depth Fusion GAN for Indoor Depth Completion(用於室內深度完成的 RGB 深度融合 GAN)
paper：https://arxiv.org/abs/2203.10856[5] Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light(結合雙目立體和單目結構光的深度估計)
paper：https://arxiv.org/abs/2203.10493
code：https://github.com/YuhuaXu/MonoStereoFusion

人體姿態估計

[1] Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization(用於單目絕對 3D 定位的基於射線的 3D 人體姿態估計)
paper：https://arxiv.org/abs/2203.11471
code：https://github.com/YxZhxn/Ray3D

圖像、視頻檢索與理解

動作識別

[1] Self-supervised Video Transformer(自監督視頻transformer)
paper：https://arxiv.org/abs/2112.01514
code：https://git.io/J1juJ[2] DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition(魯棒動作識別的 Transformer 方法中的定向注意)
paper：https://arxiv.org/abs/2203.10233[3] Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos(尋找變化：從未修剪的網絡視頻中學習對象狀態和狀態修改操作)
paper：https://arxiv.org/abs/2203.11637
code：https://github.com/zju-vipa/MEAT-TIL[4] E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition(用於以自我為中心的動作識別的運動增強事件流)
paper：https://arxiv.org/abs/2112.03596[5] How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs(你怎么做呢？使用偽副詞進行細粒度的動作理解)
paper：https://arxiv.org/abs/2203.12344

行人重識別

[1] Cascade Transformers for End-to-End Person Search(用於端到端人員搜索的級聯transformer)
paper：https://arxiv.org/abs/2203.09642
code：https://github.com/Kitware/COAT

圖像字幕

[1] Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources(通過在線資源對上下文外圖像進行開放域、基於內容、多模式的事實檢查)
paper：https://arxiv.org/abs/2112.00061
code：https://s-abdelnabi.github.io/OoC-multi-modal-fc/

醫學影像

[1] ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification(半監督醫學圖像分類的反課程偽標簽)
paper：https://arxiv.org/abs/2111.12918[2] DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification(用於組織病理學全幻燈片圖像分類的雙層特征蒸餾多實例學習)
paper：https://arxiv.org/abs/2203.12081
code：https://github.com/hrzhang1123/DTFD-MIL

文本檢測與識別

[1] Fourier Document Restoration for Robust Document Dewarping and Recognition(用於魯棒文檔去扭曲和識別的傅里葉文檔恢復)
paper：https://arxiv.org/abs/2203.09910
code：https://sg-vilab.github.io/event/warpdoc/[2] SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition(通過文本檢測和文本識別之間更好的協同作用進行場景文本定位)
paper：https://arxiv.org/abs/2203.10209
code：https://github.com/mxin262/SwinTextSpotter

目標跟蹤

[1] MixFormer: End-to-End Tracking with Iterative Mixed Attention(具有迭代混合注意力的端到端跟蹤)
paper：https://arxiv.org/abs/2203.11082
code：https://github.com/MCG-NJU/MixFormer[2] Unsupervised Domain Adaptation for Nighttime Aerial Tracking(夜間空中跟蹤的無監督域自適應)
paper：https://arxiv.org/abs/2203.10541
code：https://github.com/vision4robotics/UDAT[3] Global Tracking Transformers
paper：https://arxiv.org/abs/2203.13250
code：https://github.com/xingyizhou/GTR[4] Transforming Model Prediction for Tracking(轉換模型預測以進行跟蹤)
paper：https://arxiv.org/abs/2203.11192
code：https://github.com/visionml/pytracking

人臉

[1] HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network(分層解析膠囊網絡的無監督人臉部分發現)
paper：https://arxiv.org/abs/2203.10699[2] Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data(利用 3D 合成數據去除人像眼鏡和陰影)
paper：https://arxiv.org/abs/2203.10474
code：https://github.com/StoryMY/take-off-eyeglasses[3] Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?(跨模態感知者：可以從聲音中收集面部幾何形狀嗎？)
paper：https://arxiv.org/abs/2203.09824
project：https://choyingw.github.io/works/Voice2Mesh/index.html

人臉編輯

[1] FENeRF: Face Editing in Neural Radiance Fields(神經輻射場中的人臉編輯)
paper：https://arxiv.org/abs/2111.15490
project：https://mrtornado24.github.io/FENeRF/

人臉偽造

[1] Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection(對抗樣本的自監督學習：邁向 Deepfake 檢測的良好泛化)
paper：https://arxiv.org/abs/2203.12208
code：https://github.com/liangchen527/SLADD

表情識別

[1] Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin(具有自適應置信度的半監督深度面部表情識別)
paper：https://arxiv.org/abs/2203.12341
code：https://github.com/hangyu94/Ada-CM

圖像處理

圖像復原/圖像重建

[1] Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction(用於高效高光譜圖像重建的掩模引導光譜變換器)
paper：https://arxiv.org/abs/2111.07910
code：https://github.com/caiyuanhao1998/MST/[2] Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction(通過隨機收縮加速逆問題的條件擴散模型)
paper：https://arxiv.org/abs/2112.05146[3] Exploring and Evaluating Image Restoration Potential in Dynamic Scenes(探索和評估動態場景中的圖像復原潛力)
paper：https://arxiv.org/abs/2203.11754

超分辨率

[1] Local Texture Estimator for Implicit Representation Function(隱式表示函數的局部紋理估計器)
paper：https://arxiv.org/abs/2111.08918[2] Deep Constrained Least Squares for Blind Image Super-Resolution(用於盲圖像超分辨率的深度約束最小二乘)
paper：https://arxiv.org/abs/2202.07508[3] High-Resolution Image Harmonization via Collaborative Dual Transformations(通過協作雙變換實現高分辨率圖像協調)
paper：https://arxiv.org/abs/2109.06671
code：https://github.com/bcmi/CDTNet-High-Resolution-Image-Harmonization

圖像去噪/去雨

[1] IDR: Self-Supervised Image Denoising via Iterative Data Refinement(通過迭代數據細化的自監督圖像去噪)
paper：https://arxiv.org/abs/2111.14358
code：https://github.com/zhangyi-3/IDR[2] AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network(通過非對稱 PD 和盲點網絡對真實世界圖像進行自監督去噪)
paper：https://arxiv.org/abs/2203.11799
code：https://github.com/wooseoklee4/AP-BSN[3] CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image(通過從圖像中分離噪聲的自監督圖像去噪的循環多變量函數)
paper：https://arxiv.org/abs/2203.13009
code：https://github.com/Reyhanehne/CVF-SID_PyTorch[4] Unpaired Deep Image Deraining Using Dual Contrastive Learning(使用雙重對比學習的非配對深度圖像去雨)
paper：https://arxiv.org/abs/2109.02973

風格遷移

[1] Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation(具有大規模幾何變形和內容保留的工業風格遷移)
paper：https://arxiv.org/abs/2203.12835
project：https://jcyang98.github.io/InST/home.html
code：https://github.com/jcyang98/InST[2] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer(基於示例的高分辨率肖像風格遷移)
paper：https://arxiv.org/abs/2203.13248
code：https://github.com/williamyang1991/DualStyleGAN
project：https://www.mmlab-ntu.com/project/dualstylegan/

圖像翻譯

[1] Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation(未配對圖像到圖像翻譯的最大空間擾動一致性)
paper：https://arxiv.org/abs/2203.12707
code：https://github.com/batmanlab/MSPC[2] Globetrotter: Connecting Languages by Connecting Images(通過連接圖像連接語言)
paper：https://arxiv.org/abs/2012.04631

三維視覺

[1] Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings(在 3D 網格中嵌入消息並從 2D 渲染中提取它們)
paper：https://arxiv.org/abs/2104.13450[2] The Neurally-Guided Shape Parser: Grammar-based Labeling of 3D Shape Regions with Approximate Inference(神經引導的形狀解析器：具有近似推理的 3D 形狀區域的基於語法的標記)
paper：https://arxiv.org/abs/2106.12026
code：https://github.com/rkjones4/NGSP

點雲

[1] No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces(沒有痛苦，收獲很大：通過擬合特征級時空表面，用靜態模型對動態點雲序列進行分類)
paper：https://arxiv.org/abs/2203.11113
code：https://github.com/jx-zhong-for-academic-purpose/Kinet[2] IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment(通過深度嵌入對齊的動態 3D 點雲插值)
paper：https://arxiv.org/abs/2203.11590
code：https://github.com/ZENGYIMING-EAMON/IDEA-Net.git[3] AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception(利用點雲的徑向對稱性進行方位歸一化 3D 感知)
paper：https://arxiv.org/abs/2203.13090
code：https://github.com/hustvl/AziNorm[4] WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation(為對抗性 3D 點雲生成扭曲多個均勻先驗)
paper：https://arxiv.org/abs/2203.12917
code：https://github.com/yztang4/WarpingGAN.git

三維重建

[1] Input-level Inductive Biases for 3D Reconstruction(用於 3D 重建的輸入級歸納偏差)
paper：https://arxiv.org/abs/2112.03243[2] ϕ-SfT: Shape-from-Template with a Physics-Based Deformation Model(具有基於物理的變形模型的模板形狀)
paper：https://arxiv.org/abs/2203.11938
code：https://4dqv.mpi-inf.mpg.de/phi-SfT/[3] PLAD: Learning to Infer Shape Programs with Pseudo-Labels and Approximate Distributions(學習用偽標簽和近似分布推斷形狀程序)
paper：https://arxiv.org/abs/2011.13045
code：https://github.com/rkjones4/PLAD[4] Neural Reflectance for Shape Recovery with Shadow Handling(使用陰影處理進行形狀恢復的神經反射)
paper：https://arxiv.org/abs/2203.12909
code：https://github.com/junxuan-li/Neural-Reflectance-PS

場景重建/視圖合成

[1] GeoNeRF: Generalizing NeRF with Geometry Priors(用幾何先驗概括 NeRF)
paper：https://arxiv.org/abs/2111.13539
code：https://www.idiap.ch/paper/geonerf[2] NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction(用於大規模場景重建的融合輻射場)
paper：https://arxiv.org/abs/2203.11283[3] PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo(從多視圖立體重建 3D 平面)
paper：https://arxiv.org/abs/2203.12082

視頻處理

[1] Unifying Motion Deblurring and Frame Interpolation with Events(將運動去模糊和幀插值與事件統一起來)
paper：https://arxiv.org/abs/2203.12178

視頻編輯

[1] M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers(M3L：通過多模式多級transformer進行基於語言的視頻編輯)
paper：https://arxiv.org/abs/2104.01122

場景圖生成

[1] Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation(用於無偏場景圖生成的堆疊混合注意力和組協作學習)
paper：https://arxiv.org/abs/2203.09811
code：https://github.com/dongxingning/SHA-GCL-for-SGG

遷移學習/domain

[1] Learning Affordance Grounding from Exocentric Images(從離中心圖像中學習可供性基礎)
paper：https://arxiv.org/abs/2203.09905
code：http://github.com/lhc1224/Cross-View-AG[2] Compound Domain Generalization via Meta-Knowledge Encoding(基於元知識編碼的復合域泛化)
paper：https://arxiv.org/abs/2203.13006

對抗式

[1] DTA: Physical Camouflage Attacks using Differentiable Transformation Network(使用可微變換網絡的物理偽裝攻擊)
paper：https://arxiv.org/abs/2203.09831
code：https://islab-ai.github.io/dta-cvpr2022/[2] Subspace Adversarial Training(子空間對抗訓練)
paper：https://arxiv.org/abs/2111.12229
code：https://github.com/nblt/Sub-AT

數據集

[1] M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining(電子商務多模態預訓練的自協調對比學習)(多模態預訓練數據集)
paper：https://arxiv.org/abs/2109.04275[2] Egocentric Prediction of Action Target in 3D(以自我為中心的 3D 行動目標預測)(機器人)
paper：https://arxiv.org/abs/2203.13116
project：https://ai4ce.github.io/EgoPAT3D/[3] DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation(用於語義變化分割的每日多光譜衛星數據集)
paper：https://arxiv.org/abs/2203.12560
data：https://mediatum.ub.tum.de/1650201
website：https://codalab.lisn.upsaclay.fr/competitions/2882

數據處理

[1] Dataset Distillation by Matching Training Trajectories(通過匹配訓練軌跡進行數據集蒸餾)(數據集蒸餾)
paper：https://arxiv.org/abs/2203.11932
code：https://github.com/GeorgeCazenavette/mtt-distillation
project：https://georgecazenavette.github.io/mtt-distillation/

圖像壓縮

[1] Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression(用於高效神經圖像壓縮的統一多元高斯混合)
paper：https://arxiv.org/abs/2203.10897
code：https://github.com/xiaosu-zhu/McQuic[2] ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding(具有不均勻分組的空間通道上下文自適應編碼的高效學習圖像壓縮)
paper：https://arxiv.org/abs/2203.10886

歸一化

[1] Delving into the Estimation Shift of Batch Normalization in a Network(深入研究網絡中批量標准化的估計偏移)
paper：https://arxiv.org/abs/2203.10778
code：https://github.com/huangleiBuaa/XBNBlock

視覺表征學習

[1] SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization(通過相似性感知歸一化探索場景文本的自監督表示學習)
paper：https://arxiv.org/abs/2203.10492[1] Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization(通過節點到鄰域互信息最大化的圖中節點表示學習)
paper：https://arxiv.org/abs/2203.12265
code：https://github.com/dongwei156/n2n

神經網絡結構設計

[1] DyRep: Bootstrapping Training with Dynamic Re-parameterization(使用動態重新參數化的引導訓練)
paper：https://arxiv.org/abs/2203.12868
code：https://github.com/hunto/DyRep

CNN

[1] TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing(用於布局感知視覺處理的高效翻譯變體卷積)(動態卷積)
paper：https://arxiv.org/abs/2203.10489
code：https://github.com/JierunChen/TVConv

Transformer

[5] Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training(引導 ViT：從預訓練中解放視覺transformer)
paper：https://arxiv.org/abs/2112.03552
code：https://github.com/zhfeing/Bootstrapping-ViTs-pytorch

神經網絡架構搜索

[1] Training-free Transformer Architecture Search(免訓練transformer架構搜索)
paper：https://arxiv.org/abs/2203.12217

模型訓練/泛化

[6] Out-of-distribution Generalization with Causal Invariant Transformations(具有因果不變變換的分布外泛化)
paper：https://arxiv.org/abs/2203.11528

噪聲標簽

[1] Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(帶有噪聲標簽的學習中噪聲檢測的可擴展懲罰回歸)
paper：https://arxiv.org/abs/2203.07788
code：https://github.com/Yikai-Wang/SPR-LNL

小樣本學習

[1] Ranking Distance Calibration for Cross-Domain Few-Shot Learning(跨域小樣本學習的排名距離校准)
paper：https://arxiv.org/abs/2112.00260

度量學習

[1] Hyperbolic Vision Transformers: Combining Improvements in Metric Learning(雙曲線視覺transformer：結合度量學習的改進)
paper：https://arxiv.org/abs/2203.10833
code：https://github.com/htdt/hyp_metric

持續學習

[1] Learning to Prompt for Continual Learning(學習提示持續學習)
paper：https://arxiv.org/abs/2112.08654
code：https://github.com/google-research/l2p[1] Meta-attention for ViT-backed Continual Learning(ViT 支持的持續學習的元注意力)
paper：https://arxiv.org/abs/2203.11684
code：https://github.com/zju-vipa/MEAT-TIL

聯邦學習

[1] Federated Class-Incremental Learning(聯邦類增量學習)
paper：https://arxiv.org/abs/2203.11473
code：https://github.com/conditionWang/FCIL[2] FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction(通過局部漂移解耦和校正與非 IID 數據進行聯邦學習)
paper：https://arxiv.org/abs/2203.11751
code：https://github.com/gaoliang13/FedDC[1] FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning(用於異構聯邦學習的基於相關性的主動客戶端選擇策略)
paper：https://arxiv.org/abs/2103.13822

元學習

[1] Multidimensional Belief Quantification for Label-Efficient Meta-Learning(標簽高效元學習的多維信念量化)
paper：https://arxiv.org/abs/2203.12768

強化學習

[1] Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory(具有編排記憶的演員評論家 GPT 的 3D 舞蹈生成)
paper：https://arxiv.org/abs/2203.13055
code：https://github.com/lisiyao21/Bailando/

多模態學習

視覺-語言

[1] An Empirical Study of Training End-to-End Vision-and-Language Transformers(培訓端到端視覺和語言transformer的實證研究)
paper：https://arxiv.org/abs/2111.02387
code：https://github.com/zdou0830/METER[1] VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks(視覺和語言任務的參數高效遷移學習)
paper：https://arxiv.org/abs/2112.06825
code：https://github.com/ylsung/VL_adapter[2] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model(預測、預防和評估：由預訓練的視覺語言模型支持的解耦的文本驅動圖像處理)
paper：https://arxiv.org/abs/2111.13333
code：https://github.com/zipengxuc/PPE[3] LAFITE: Towards Language-Free Training for Text-to-Image Generation(面向文本到圖像生成的無語言培訓)
paper：https://arxiv.org/abs/2111.13792
code：https://github.com/drboog/Lafite

視聽學習

[1] UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection(用於聯合視頻時刻檢索和高光檢測的統一多模態transformer)
paper：https://arxiv.org/abs/2203.12745
code：https://github.com/TencentARC/UMT[2] Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation(用於協同語音手勢生成的學習分層跨模式關聯)
paper：https://arxiv.org/abs/2203.13161
project：https://alvinliu0.github.io/projects/HA2G

視覺預測

[1] GaTector: A Unified Framework for Gaze Object Prediction(凝視對象預測的統一框架)
paper：https://arxiv.org/abs/2112.03549[7] Remember Intentions: Retrospective-Memory-based Trajectory Prediction(記住意圖：基於回顧性記憶的軌跡預測)
paper：https://arxiv.org/abs/2203.11474
code：https://github.com/MediaBrain-SJTU/MemoNet

視頻計數

[1] DR.VIC: Decomposition and Reasoning for Video Individual Counting(視頻個體計數的分解與推理)
paper：https://arxiv.org/abs/2203.12335
code：https://github.com/taohan10200/DRNet

其他

Robust and Accurate Superquadric Recovery: a Probabilistic Approach(穩健且准確的超二次曲線恢復：一種概率方法)
paper：https://arxiv.org/abs/2111.14517
code：http://github.com/bmlklwx/EMS-superquadric_fitting.gitLearning from All Vehicles(向所有車輛學習)(自動駕駛)
paper：https://arxiv.org/abs/2203.11934
code：https://github.com/dotchen/LAV
demo：https://dotchen.github.io/LAV/
Mixed Differential Privacy in Computer Vision(計算機視覺中的混合差分隱私)
paper：https://arxiv.org/abs/2203.11481Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition(基於事件的對象識別的測試時間適應)
paper：https://arxiv.org/abs/2203.12247TransVPR: Transformer-based place recognition with multi-level attention aggregation(具有多級注意力聚合的基於 Transformer 的位置識別)(圖像匹配)
paper：https://arxiv.org/abs/2201.02001Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction(用於有效降維的分層最近鄰圖嵌入)
paper：https://arxiv.org/abs/2203.12997
code：https://github.com/koulakis/h-nneMoving Window Regression: A Novel Approach to Ordinal Regression(序數回歸的一種新方法)
paper：https://arxiv.org/abs/2203.13122 code：https://github.com/nhshin-mcl/MWR

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。