数据集整理

计算机视觉

图像分类

MNIST

MNIST database - Wikipedia
MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges
MNIST Dataset | DeepAI

Fashion-MNIST

GitHub - zalandoresearch/fashion-mnist: A MNIST-like fashion product database. Benchmark
fashion_mnist | TensorFlow Datasets
5.3 Fashion MNIST - Pytorch中文手册
 fashion_mnist | TensorFlow Datasets (google.cn)
Fashion MNIST dataset, an alternative to MNIST (keras.io)

CIFAR - 10

CIFAR-10是一个更接近普适物体的彩色图像数据集。CIFAR-10 是由Hinton 的学生Alex Krizhevsky 和Ilya Sutskever 整理的一个用于识别普适物体的小型数据集。一共包含10 个类别的RGB 彩色图片：飞机（ airplane ）、汽车（ automobile ）、鸟类（ bird ）、猫（ cat ）、鹿（ deer ）、狗（ dog ）、蛙类（ frog ）、马（ horse ）、船（ ship ）和卡车（ truck ）。
每个图片的尺寸为32 × 32 ，每个类别有6000个图像，数据集中一共有50000 张训练图片和10000 张测试图片。

CIFAR-10 and CIFAR-100 datasets (toronto.edu)

Dataset之CIFAR-10：CIFAR-10数据集简介、下载、使用方法之详细攻略_一个处女座的程序猿的博客-CSDN博客_cifar-10
CIFAR10数据集手动下载和导入 - 简书 (jianshu.com)
CIFAR10数据集的下载及使用 - 知乎 (zhihu.com)

ImageNet

ImageNet - Wikipedia
ImageNet (image-net.org)
ImageNet - 维基百科，自由的百科全书 (wikipedia.org)

ImageNet这八年：李飞飞和她改变的AI世界 - 知乎 (zhihu.com)

在早期的计算机视觉社区，PASCALViSualObjectClasses（VOC）挑战赛（从2005年到2012）是最重要的竞赛之一。在PASCALVOC中是多任务的，包括图像分类，目标检测，语义分割和行为检测。

VOC数据集是目标检测经常用的一个数据集，自2005年起每年举办一次比赛，最开始只有4类，到2007年扩充为20个类，共有两个常用的版本：2007和2012。学术界常用5k的train/val 2007和16k的train/val 2012作为训练集，test 2007作为测试集，用10k的train/val 2007+test 2007和16k的train/val 2012作为训练集，test2012作为测试集，分别汇报结果。

ImageNetLargeScaleVisualRecognitionChallenge(ILSVRC)已经将一般的目标检测向前推进了一大步。ILSVRC从2010到2017年每年被组织比赛，其中就包含了用ImageNet图像进行检测。ILSVRC中包含了200类视觉目标，图像和目标实例的数量比VOC大两个数量级。例如，ILSVRC-14就包含了517K张图像和534k被标注的目标

MS-COCO是目前最具有挑战性的目标检测，从2015年开始，每年都会举办基于MS-COCO数据集的竞赛，其包含的目标种类要少于ILSVRC，但其有更多的目标实例。例如，MS-COCO-17中包含了164k张图像和897K个被标注来自80个类别的目标。相比于VOC和ILSVRC，MS-COCO最大的进步，除了boundingbox的标注，还有单个实例分割的标注，帮助更准确的定位。另外，MS-COCO包含了更多小目标（其面积小于图像的1%）和更加密集的定位目标比VOC和ILSVRC。MS-COCO的这些特征让其目标分布更接近于真实的世界。MS-COCO已经在目标检测社区变为了实际的标杆。

DOTA是遥感航空图像检测的常用数据集，包含2806张航空图像，尺寸大约为4kx4k，包含15个类别共计188282个实例，其中14个主类，small vehicle 和 large vehicle都是vehicle的子类。其标注方式为四点确定的任意形状和方向的四边形。航空图像区别于传统数据集，有其自己的特点，如：尺度变化性更大；密集的小物体检测；检测目标的不确定性。数据划分为1/6验证集，1/3测试集，1/2训练集。目前发布了训练集和验证集，图像尺寸从800×800到4000×4000不等。

目标检测

COCO

COCO - Common Objects in Context (cocodataset.org)

语义分割

VOC2012

The PASCAL Visual Object Classes Challenge 2012 (VOC2012) (ox.ac.uk)

Cityscapes

Cityscapes Dataset – Semantic Understanding of Urban Street Scenes (cityscapes-dataset.com)

Mapillary

KITTI

The KITTI Vision Benchmark Suite (cvlibs.net)

语义分割的数据集 - 知乎 (zhihu.com)

作者：知乎用户
链接：https://www.zhihu.com/question/30626971/answer/1996387512
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

参考网站https://awesomeopensource.com/project/jsbroks/awesome-dataset-tools

我觉得写得已经很全面了。

Awesome Dataset Tools

A curated list of awesome dataset tools

Labeling Tools

Images

CVAT - Online, interactive video and image annotation tool for computer vision
COCO Annotator - Web-based image segmentation tool for object detection, localization and keypoints
VoTT - Visual Object Tagging Tool: An electron app for building end to end object detection models from images and videos.
Scalabel - Versatile and scalable tool that supports various kinds of annotations
EVA - EVA is a web-based tool for efficient annotation of videos and image sequences and has an additional tracking capabilities
LOST - Design your own smart Image Annotation process in a web-based environment
Boobs - Fast and efficient BBox annotation for your images in YOLO, VOC/COCO formats
MuViLab - Tool to help you labelling videos for computer vision
Turkey - Web UI on Amazon Mechanical Turk to crowd-source image segmentation
React Image Annotation - An infinitely customizable image tool built on React
Point Cloud Annotation Tool - Annotate 3D boxes in point cloud
ImageTagger - Open source online platform for collaborative image labeling
DeepLabel - A cross-platform image annotation tool for machine learning
Visual Object Tagging Tool - An electron app for building end to end Object Detection Models
VGG Image Annotator - Standalone image annotator application packaged as a single HTML file
SMART - Efficiently build labeled training datasets for supervised machine learning tasks
Pixel Annotation Tool - Uses the algorithm watershed marked of OpenCV to annotate images in directories
Pixie - GUI annotation tool which provides the bounding box, polygon, and semantic segmentation
Turktool - Modern React app for scalable bounding box annotation of images
LabelD - Simple image annotation tool to streamlining the overall process
Comma Coloring - Adult coloring book for image segmentation
LabelImg - Graphical image annotation tool and label object bounding boxes in images
LCs Finder - Image annotation and object detection tool written in C
js-segment-annotator - Javascript image annotation tool based on image segmentation
Cytomine - Analysis of multi-gigapixel images
labelme - Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation)
SimpleAnnotate - Open source video and image annotation software for, currently only for OSX
Sloth - Labeling image and video data for computer vision research
Fast Annotation Tool - Online platform for collaborative image annotation
Anno-Mage - Helps you in annotating images by suggesting you annotations for 80 object classes
MedTagger - Collaborative framework for annotating medical datasets using crowdsourcing
OpenLabeling - Labeling in multiple annotation formats
Alturos.ImageAnnotation - Collaborative tool for labeling image data for yolo
Yolo_mark - GUI for marking bounded boxes of objects in images
imglab - peedup and simplify image labeling/ annotation process with multiple supported formats
OpenLabeler - Open source desktop application for annotating objects
UltimateLabeling - A multi-purpose Video Labeling GUI with integrated SOTA detector and tracker

Closed Source

DataTorch - Platform for creating and shareing datasets.
Labelbox - Platform for data labeling, data management, and data science. Its features include image annotation, bounding boxes, text classification, and more
Supervise.ly - Image annotation and data management tool that you can use create image and video datasets
Prodigy - Various machine learning models such as image classification, entity recognition and intent detection
RectLabel - Label images for bounding box object detection and segmentation
Lionbridge AI - Quickly annotate thousands of images and videos with relevant tags
TrainingData.io - Medical image annotation tool for data labeling. Spports DICOM image format for radiology AI
Spare5 - Crowdsourcing service for tasks such as data and image annotation, language assessment, and more
Hive - Text and image annotation service that helps you create training datasets
Figure Eight - Supports audio , [computer vision](https://www.zhihu.com/search?q=computer vision&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={“sourceType”%3A”answer”%2C”sourceId”%3A1996387512}), natural language processing, and other data tasks
Dataturks - Image segmentation, named [entity recognition](https://www.zhihu.com/search?q=entity recognition&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={“sourceType”%3A”answer”%2C”sourceId”%3A1996387512}) (NER) tagging in documents, and POS tagging
Playment - Services offered include bounding boxes, points and lines, polygons, semantic segmentation, and more
Cogito Tech - Image annotation, content moderation, sentiment analysis, chatbot training
OCLAVI - Annotate Bounding Box, Polygon, Circle, Point and Cuboidal annotations with precision
Humans in the Loop - Use cases include face recognition, autonomous vehicles, and figure detection
WorkAround - Host and annotate data, manage projects, and build datasets alongside top companies
TaQadam - On-demand annotation with agents-in-the-loop
Zillin - Image annotation service for classification, object detection and segmentation with API access and georeferenced images support.
IBM Cloud Annotations - Simple and collaborative image annotation tool for teams and individuals inside ibm cloud environment.
MedSeg - Free online medical annotation (segmentation) with AI models.
MVTec Deep Learning Tool - Provides labeling functionalities for HALCON‘s deep-learning-based object detection and classification.

Audio

Audio Annotator - JavaScript interface for annotating and labeling audio files
Dynitag - Web-based collaborative audio annotator tool
EchoML - play, visualize, and annotate your audio files for machine learning

Closed Source

Figure Eight - Supports audio , computer vision, natural language processing, and other data tasks

Time Series

Curve - An integrated experimental platform for time series data anomaly detection
TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series
time-series-annotator - Implements classification tasks for time series.
WDK - Tools to facilitate the development of activity recognition applications with wearable devices

Text

brat - For all your textual annotation needs
doccano - Open source text annotation tool for machine learning practitioner.
Inception - A semantic annotation platform offering intelligent annotation assistance
NeuroNER - Named-entity recognition using [neural networks](https://www.zhihu.com/search?q=neural networks&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={“sourceType”%3A”answer”%2C”sourceId”%3A1996387512})
YEDDA - For annotating chunk/entity/event on text, symbol and even emoji
TALEN - Web-based tool for annotating word sequences
WebAnno - Web-based annotation tool for a wide range of linguistic annotations
MAE - Lightweight, general-purpose natural language annotation tool
Anafora - Web-based raw text annotation tool
TagEditor - Label dependencies, parts of speech, Named entities, and text categories
ML-Annotate - Supports binary, multi-label and multi-class labeling of text

Closed Source

Hive - Text and image annotation service that helps you create training datasets
Figure Eight - Supports audio , computer vision, natural language processing, and other data tasks
LightTag Text Annotation Tool for Teams.