数据集整理

计算机视觉

图像分类

MNIST

MNIST database - Wikipedia
MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges
MNIST Dataset | DeepAI

Fashion-MNIST

GitHub - zalandoresearch/fashion-mnist: A MNIST-like fashion product database. Benchmark
fashion_mnist | TensorFlow Datasets
5.3 Fashion MNIST - Pytorch中文手册
fashion_mnist | TensorFlow Datasets (google.cn)
Fashion MNIST dataset, an alternative to MNIST (keras.io)

CIFAR - 10

CIFAR-10是一个更接近普适物体的彩色图像数据集。CIFAR-10 是由Hinton 的学生Alex Krizhevsky 和Ilya Sutskever 整理的一个用于识别普适物体的小型数据集。一共包含10 个类别的RGB 彩色图片:飞机( airplane )、汽车( automobile )、鸟类( bird )、猫( cat )、鹿( deer )、狗( dog )、蛙类( frog )、马( horse )、船( ship )和卡车( truck )。
每个图片的尺寸为32 × 32 ,每个类别有6000个图像,数据集中一共有50000 张训练图片和10000 张测试图片。

CIFAR-10 and CIFAR-100 datasets (toronto.edu)

Dataset之CIFAR-10:CIFAR-10数据集简介、下载、使用方法之详细攻略_一个处女座的程序猿的博客-CSDN博客_cifar-10
CIFAR10数据集手动下载和导入 - 简书 (jianshu.com)
CIFAR10数据集的下载及使用 - 知乎 (zhihu.com)

ImageNet

ImageNet - Wikipedia
ImageNet (image-net.org)
ImageNet - 维基百科,自由的百科全书 (wikipedia.org)

ImageNet这八年:李飞飞和她改变的AI世界 - 知乎 (zhihu.com)

在早期的计算机视觉社区,PASCALViSualObjectClasses(VOC)挑战赛(从2005年到2012)是最重要的竞赛之一。在PASCALVOC中是多任务的,包括图像分类,目标检测,语义分割和行为检测。

​ VOC数据集是目标检测经常用的一个数据集,自2005年起每年举办一次比赛,最开始只有4类,到2007年扩充为20个类,共有两个常用的版本:2007和2012。学术界常用5k的train/val 2007和16k的train/val 2012作为训练集,test 2007作为测试集,用10k的train/val 2007+test 2007和16k的train/val 2012作为训练集,test2012作为测试集,分别汇报结果。

ImageNetLargeScaleVisualRecognitionChallenge(ILSVRC)已经将一般的目标检测向前推进了一大步。ILSVRC从2010到2017年每年被组织比赛,其中就包含了用ImageNet图像进行检测。ILSVRC中包含了200类视觉目标,图像和目标实例的数量比VOC大两个数量级。例如,ILSVRC-14就包含了517K张图像和534k被标注的目标

MS-COCO是目前最具有挑战性的目标检测,从2015年开始,每年都会举办基于MS-COCO数据集的竞赛,其包含的目标种类要少于ILSVRC,但其有更多的目标实例。例如,MS-COCO-17中包含了164k张图像和897K个被标注来自80个类别的目标。相比于VOC和ILSVRC,MS-COCO最大的进步,除了boundingbox的标注,还有单个实例分割的标注,帮助更准确的定位。另外,MS-COCO包含了更多小目标(其面积小于图像的1%)和更加密集的定位目标比VOC和ILSVRC。MS-COCO的这些特征让其目标分布更接近于真实的世界。MS-COCO已经在目标检测社区变为了实际的标杆。

DOTA是遥感航空图像检测的常用数据集,包含2806张航空图像,尺寸大约为4kx4k,包含15个类别共计188282个实例,其中14个主类,small vehicle 和 large vehicle都是vehicle的子类。其标注方式为四点确定的任意形状和方向的四边形。航空图像区别于传统数据集,有其自己的特点,如:尺度变化性更大;密集的小物体检测;检测目标的不确定性。数据划分为1/6验证集,1/3测试集,1/2训练集。目前发布了训练集和验证集,图像尺寸从800×800到4000×4000不等。

目标检测

COCO

COCO - Common Objects in Context (cocodataset.org)

语义分割

VOC2012

The PASCAL Visual Object Classes Challenge 2012 (VOC2012) (ox.ac.uk)

Cityscapes

Cityscapes Dataset – Semantic Understanding of Urban Street Scenes (cityscapes-dataset.com)

Mapillary

Mapillary

KITTI

The KITTI Vision Benchmark Suite (cvlibs.net)

语义分割的数据集 - 知乎 (zhihu.com)


作者:知乎用户
链接:https://www.zhihu.com/question/30626971/answer/1996387512
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

参考网站https://awesomeopensource.com/project/jsbroks/awesome-dataset-tools

我觉得写得已经很全面了。

Awesome Dataset Tools

A curated list of awesome dataset tools

Labeling Tools

Images

  1. CVAT - Online, interactive video and image annotation tool for computer vision
  2. COCO Annotator - Web-based image segmentation tool for object detection, localization and keypoints
  3. VoTT - Visual Object Tagging Tool: An electron app for building end to end object detection models from images and videos.
  4. Scalabel - Versatile and scalable tool that supports various kinds of annotations
  5. EVA - EVA is a web-based tool for efficient annotation of videos and image sequences and has an additional tracking capabilities
  6. LOST - Design your own smart Image Annotation process in a web-based environment
  7. Boobs - Fast and efficient BBox annotation for your images in YOLO, VOC/COCO formats
  8. MuViLab - Tool to help you labelling videos for computer vision
  9. Turkey - Web UI on Amazon Mechanical Turk to crowd-source image segmentation
  10. React Image Annotation - An infinitely customizable image tool built on React
  11. Point Cloud Annotation Tool - Annotate 3D boxes in point cloud
  12. ImageTagger - Open source online platform for collaborative image labeling
  13. DeepLabel - A cross-platform image annotation tool for machine learning
  14. Visual Object Tagging Tool - An electron app for building end to end Object Detection Models
  15. VGG Image Annotator - Standalone image annotator application packaged as a single HTML file
  16. SMART - Efficiently build labeled training datasets for supervised machine learning tasks
  17. Pixel Annotation Tool - Uses the algorithm watershed marked of OpenCV to annotate images in directories
  18. Pixie - GUI annotation tool which provides the bounding box, polygon, and semantic segmentation
  19. Turktool - Modern React app for scalable bounding box annotation of images
  20. LabelD - Simple image annotation tool to streamlining the overall process
  21. Comma Coloring - Adult coloring book for image segmentation
  22. LabelImg - Graphical image annotation tool and label object bounding boxes in images
  23. LCs Finder - Image annotation and object detection tool written in C
  24. js-segment-annotator - Javascript image annotation tool based on image segmentation
  25. Cytomine - Analysis of multi-gigapixel images
  26. labelme - Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation)
  27. SimpleAnnotate - Open source video and image annotation software for, currently only for OSX
  28. Sloth - Labeling image and video data for computer vision research
  29. Fast Annotation Tool - Online platform for collaborative image annotation
  30. Anno-Mage - Helps you in annotating images by suggesting you annotations for 80 object classes
  31. MedTagger - Collaborative framework for annotating medical datasets using crowdsourcing
  32. OpenLabeling - Labeling in multiple annotation formats
  33. Alturos.ImageAnnotation - Collaborative tool for labeling image data for yolo
  34. Yolo_mark - GUI for marking bounded boxes of objects in images
  35. imglab - peedup and simplify image labeling/ annotation process with multiple supported formats
  36. OpenLabeler - Open source desktop application for annotating objects
  37. UltimateLabeling - A multi-purpose Video Labeling GUI with integrated SOTA detector and tracker

Closed Source

  1. DataTorch - Platform for creating and shareing datasets.
  2. Labelbox - Platform for data labeling, data management, and data science. Its features include image annotation, bounding boxes, text classification, and more
  3. Supervise.ly - Image annotation and data management tool that you can use create image and video datasets
  4. Prodigy - Various machine learning models such as image classification, entity recognition and intent detection
  5. RectLabel - Label images for bounding box object detection and segmentation
  6. Lionbridge AI - Quickly annotate thousands of images and videos with relevant tags
  7. TrainingData.io - Medical image annotation tool for data labeling. Spports DICOM image format for radiology AI
  8. Spare5 - Crowdsourcing service for tasks such as data and image annotation, language assessment, and more
  9. Hive - Text and image annotation service that helps you create training datasets
  10. Figure Eight - Supports audio , [computer vision](https://www.zhihu.com/search?q=computer vision&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={“sourceType”%3A”answer”%2C”sourceId”%3A1996387512}), natural language processing, and other data tasks
  11. Dataturks - Image segmentation, named [entity recognition](https://www.zhihu.com/search?q=entity recognition&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={“sourceType”%3A”answer”%2C”sourceId”%3A1996387512}) (NER) tagging in documents, and POS tagging
  12. Playment - Services offered include bounding boxes, points and lines, polygons, semantic segmentation, and more
  13. Cogito Tech - Image annotation, content moderation, sentiment analysis, chatbot training
  14. OCLAVI - Annotate Bounding Box, Polygon, Circle, Point and Cuboidal annotations with precision
  15. Humans in the Loop - Use cases include face recognition, autonomous vehicles, and figure detection
  16. WorkAround - Host and annotate data, manage projects, and build datasets alongside top companies
  17. TaQadam - On-demand annotation with agents-in-the-loop
  18. Zillin - Image annotation service for classification, object detection and segmentation with API access and georeferenced images support.
  19. IBM Cloud Annotations - Simple and collaborative image annotation tool for teams and individuals inside ibm cloud environment.
  20. MedSeg - Free online medical annotation (segmentation) with AI models.
  21. MVTec Deep Learning Tool - Provides labeling functionalities for HALCON‘s deep-learning-based object detection and classification.

Audio

  1. Audio Annotator - JavaScript interface for annotating and labeling audio files
  2. Dynitag - Web-based collaborative audio annotator tool
  3. EchoML - play, visualize, and annotate your audio files for machine learning

Closed Source

  • Figure Eight - Supports audio , computer vision, natural language processing, and other data tasks

Time Series

  1. Curve - An integrated experimental platform for time series data anomaly detection
  2. TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series
  3. time-series-annotator - Implements classification tasks for time series.
  4. WDK - Tools to facilitate the development of activity recognition applications with wearable devices

Text

  1. brat - For all your textual annotation needs
  2. doccano - Open source text annotation tool for machine learning practitioner.
  3. Inception - A semantic annotation platform offering intelligent annotation assistance
  4. NeuroNER - Named-entity recognition using [neural networks](https://www.zhihu.com/search?q=neural networks&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={“sourceType”%3A”answer”%2C”sourceId”%3A1996387512})
  5. YEDDA - For annotating chunk/entity/event on text, symbol and even emoji
  6. TALEN - Web-based tool for annotating word sequences
  7. WebAnno - Web-based annotation tool for a wide range of linguistic annotations
  8. MAE - Lightweight, general-purpose natural language annotation tool
  9. Anafora - Web-based raw text annotation tool
  10. TagEditor - Label dependencies, parts of speech, Named entities, and text categories
  11. ML-Annotate - Supports binary, multi-label and multi-class labeling of text

Closed Source

  1. Hive - Text and image annotation service that helps you create training datasets
  2. Figure Eight - Supports audio , computer vision, natural language processing, and other data tasks
  3. LightTag Text Annotation Tool for Teams.

Libraries

Audio

  • Muda - Python library for augmenting annotated audio data

HSI

https://archive.ics.uci.edu/ml/datasets.php

https://tanxy.club/HSI


数据集整理
https://cosmicdusty.cc/post/AI/Dataset/
作者
Murphy
发布于
2022年4月8日
许可协议