Important Dates

What are participating organisations in the NAILS task expected to do?
Participating organisations will be required to submit predictions for a test set containing EEG epochs (responses) to 18,000 images. The task for participating organisations is to use the available training dataset to learn a machine-learning model to achieve a maximal BA (balanced accuracy) score on predicting search relevant vs non-search relevant images from neural responses on a test set for which the ground truth labels have been withheld by the NAILS organisers.

How will my results be benchmarked against other participants?
Participating organisations will be evaluated by their balanced accuracy score achieved on the test dataset during the evaluation period. The ground truth for the test set will not be released to participants, ensuring fair competition.

Dataset Overview

EEG data will be made available from 10 experimental participants performing 6 different search tasks using image content available in Places 365[1], ImageNet[2] and VEDAI[3] to organisations registered to take part in the NAILS workshop competition (see

[1] - Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. DOI: 015- 0816- y
[2] - Bolei Zhou, Aditya Khosla, A`gata Lapedriza, Antonio Torralba, and Aude Oliva. 2016. Places: An Image Database for Deep Scene Understanding. CoRR abs/1610.02055 (2016). 1610.02055
[3] - Sebastien Razakarivony and Frederic Jurie. 2016. Vehicle detection in aerial imagery : A small target detection benchmark. Journal of
Visual Communication and Image Representation 34 (2016), 187 – 203. DOI:

What is in the provided data?
Data (EEG) for 10 participants completing 6 search tasks is provided. Training/testing data is available in csv file formats and using a variety of common feature extraction methods. We've taken the liberty to preprocess this data so that's it's in a number of friendly formats that are ready to be used with tools like python and sklearn.

In the
.csv files training examples take the form of A) a unique identifier, B*) a value indicating whether the features relate to a target (search-task relevant) or standard (non-relevant to the search task), and C) a list of the computed features. Other available file formats implement a similar structure and will be detailed in your participation welcome pack.

The testing set will be provided in a similar format where the ground truth (
B*) will be withheld by the organisers to be used in the evaluation stage of the competition.

How was the data capture experiment structured?
Each of the 10 participants completed 6 search tasks (2 using VEDAI images, 2 using ImageNet images and 2 using Places 365 images). Each task contained 9 search blocks (that lasted approximately 35 seconds each for the experimental participant). Each 35s block was presented at 6 Hz, had 180 images (excluding padding images), and 5% target images (i.e. 9 images satisfying the search query). In total across the 6 search tasks there are 486/9234 target/standard examples available per participant. Prior to starting each of the 6 search tasks participants were shown a number of examples of target images satisfying the search query for that task and a number of standard images which did not satisfy the search query. During blocks participants were instructed to remain still and count the number of targets they observed in the high-speed sequence.

What does a RSVP search task look like?

How much training / testing data is there?
Per participant, there are 486/9234 target/standard examples available. As contaminant eye-movement related activity on the EEG can contain useful information, epochs (from -1000ms, 2000ms) containing such activity were excluded as they might encourage developed strategies to utilise these non-neural sources of discriminative information. Epochs were filtered to exclude those with a peak-to-peak amplitude greater than 70 μV on EOG and frontal EEG channels. ICA (Independent Component Analysis) alongside a wavelet based analysis were used to confirm that the remaining trials did not contain non-neural sources of discriminative information. For the workshop’s collaborative evaluation, this dataset is split into a training/testing set, where 15/285 target/non-target trials from each search task for each user are selected to act as a withheld test set in the evaluation. Competing organisations in the collaborative evaluation using the supplied training data (remaining epochs from blocks not used to extract test set data) need to build machine- learning models that maximise a BA (balanced accuracy) score on the entire withheld testing set. That means for an evaluation run, an organisation needs to submit binary predictions for the 18,000 examples given in the test set (900/17100 targets/standards respectively). There are more than 2500/50000 target/standard training examples available across all participants.

How is the training / testing split performed?
For each search task (comprised of 9 blocks), a subset of trials (from the same blocks) are used testing. This means participating organisations can choose how to best train models i.e. whether to use a subject specific model, a task specific model or alternative combinations.

How is the training/testing dataset arranged?
In each training and testing file values are comma separated. Each line corresponds to an EEG response with respect to a presented image (see paper for detail).

For example a training file contains the following:

The first column (e.g. 101_target_9770_WIND2) is a unique id for that training example that also contains some other important information. The second column indicates if it is a target example (1) or a standard example (-1). All other column fields are features (see feature formats for more detail).

Each unique_id contains 4 fields. Using 101_target_9770_WIND2 / A_B_C_D as an example:
A = 101 = experimental participant ID
B = image class (target/standard)
C = 9770 = image_id
D = WIND2 = Task ID (complete list: UAV1, UAV2, WIND1, WIND2, BIRD1, INSTR1

The notable difference with the testing file format unique_id field is that it lacks the image class field.
e.g. 101_undefined_6601_WIND1

Training/testing data is available for a number of different types of features.

Although a number of common feature combinations are already present, participating organisations can pick and choose any feature combination they wish. As each training/testing example in each feature type file has a unique id, features can be reformed across files to make unique combinations not already present.

Each filename contains 3 fields. Using 101_testing_mean1.csv / A_B_C.csv as an example:
A = 101 = experimental participant ID
B = testing = where the file contains labelled training data or unlabelled testing data
C = mean1 = feature set type (these are all documented below)

How do I submit predictions?
Prediction submissions for the NTCIR-NAILS task are made by submitting a formatted text file to the following web url:

This can be easily accomplished using the curl command for instance.
curl --upload-file my_predictions.txt

When submitting predictions for evaluation, submit all predictions in one file i.e. across all experimental participants X tasks. This file should have 18000 entries.

Submission files should be formatted as follows:
grp_id: username password
101_undefined_6601_WIND1 1
101_undefined_5431_WIND1 -1
101_undefined_6221_WIND1 1
101_undefined_6811_UAV1 -1

One prediction per line formatted as unique_id, prediction. Don't forget to include the header line!
Once predictions are deposited they will be visible at