Questions tagged [labeling]

63 questions
12
votes
2 answers

Do more object classes increase or decrease the accuracy of object detection

Assume you have an object detection dataset (e.g, MS COCO or Pascal VOC) with N images where k object classes have been labeled. You train a neural network (e.g., Faster-RCNN or YOLO) and measure the accuracy (e.g., IOU@0.5). Now you introduce x…
8
votes
2 answers

Incorporate new unlabeled data into classifier trained on a small set of labeled data

I have a set of 400 labeled samples (8 numeric features) on which I trained a binary classifier. The problem I am facing is that once the classifier is shipped to the users, I will get additional samples, but those will be unlabeled. I was…
user695652
  • 1,351
  • 3
  • 15
  • 22
5
votes
5 answers

How to deal with incorrect labels in classification?

I have a dataset with 2 classes: A and B. The problem is that 20% to 30% of the samples of class B are mislabeled (labeled as B but the right label is A) and I am not able to identify those mistakes. Is there a way/approach/method to enhance the…
5
votes
1 answer

Do ordinal variables require one hot encoding?

For categorical variables, one hot encoding is a must if the variable is non-binary . But what about ordinals? These variables are ordered but are mutually exclusive. Do they require the same treatment as categoricals other than labelling?
Shiv_90
  • 201
  • 3
  • 11
5
votes
1 answer

Labeling a pool of unlabelled samples iteratively

Problem setting I'm faced with a problem in which we have a large set of data points (100K), all of which are still unlabelled. These are to be used as input to a binary classifier at a later point in time. Since sampling is very costly, we need to…
ciri
  • 1,123
  • 9
  • 21
4
votes
2 answers

Regression algorithm on [0,1] with lots of mislabeled data

I have a training set mapping some Likert-scale variables (integers between 1 and 7, rescaled to real numbers between 0 and 1) to predict a continuous variable between 0 and 1. The data set is reasonable large ($10^4$-$10^5$ rows) but very noisy…
user1111929
  • 220
  • 1
  • 7
4
votes
0 answers

Medium Frequency Trading - Better labelling strategy?

The mid-price at time $t$ is denoted by $$p_t = \frac{s_t^{a,1} + s_t^{b,1}}{2}.$$ This mid-price can evolve in minimum increments of half a tick but is almost always observed to move at increments of a tick over time intervals of a…
Jeremie
  • 101
  • 6
3
votes
1 answer

Elastic net/LASSO with soft labels

Sometimes you do not have firm Y/N labels, but e.g. 80% probability of Y as a label. E.g. this happens, if you train a model on a small amount of labelled data, predict for a large amount of unlabelled data and then want to use the predictions as…
Björn
  • 21,227
  • 2
  • 26
  • 65
3
votes
2 answers

is it scientifically correct to label data by model built using golden data?

I am trying to find a labeled dataset for users profiles pictures with their personality traits scores. Unfortunately, I did not find any and therefore, I decided to crawl twitter for public users profile pictures with their tweets. At that moment,…
Krebto
  • 101
  • 9
3
votes
1 answer

Logistic regression - labeling outcome by confidence of classification

We have trained our logistic regression model to classify candidates attending interviews as 'pursue' or 'fail' (two possible outcome) Now as a post prediction step, we are planning to categorise the candidates as strong/mediocre/weak based on the…
2
votes
1 answer

How to make a decision - when there is a tie and no human expert

We have two algorithms (simple rule-based) working on labeling the dataset as "Yes" and "No" for a disease. There is no ML involved in this task. For ex: If Algo 1 says subject 1 has the disease (Yes) and Algo 2 also says subject 1 has the disease…
2
votes
1 answer

Supervised learning: setting labels on sliding windows of sensor data

Suppose that I have a set of accelerometer data collected with one sensor and one label for each measured data point. These labels describe different states of my system e.g., $state_A, state_B, state_C$, etc., and I want to use this information to…
2
votes
2 answers

Features and Variables in Data Analysis

I am pretty new to machine learning and data analysis in general. I have been learning about different algorithms as part of my course. Now, I am stuck with a particular problem. I have been given a dataset which has 52 variables (columns) and 500…
Ambarish
  • 119
  • 1
  • 7
2
votes
2 answers

Labels for correlation coefficients

How could we attribute labels for correlation coefficients in order to facilitate reading the data specially for non-technical people or in qualitative analyses? For example: $\rho > 0.9$ - strongly correlated $\rho > 0.7$ - moderately…
zeferino
  • 571
  • 3
  • 12
1
vote
0 answers

How to do sentiment analysis in financial news?

I already have financial news that I got from financial news sites. Now I want to apply sentiment analysis to classify news as positive, neutral, or negative. I do not know what to do. I know some sentiment analysis models like VEDER and…
Eko Putra
  • 11
  • 2
1
2 3 4 5