4

I've a set of elements that should be grouped in the minimum number of groups, each element belongs to overlapping classes, for eg:

  • a belongs to the classes x , y , z;
  • b belongs to the classes x , y;
  • c belongs to the classes x;
  • d belongs to the classes z;

So the minimum number of groups that could cover the whole data set are 2, (a, b, c) to x and (d) to z.

l'm looking for the algorithm or the statistical method to apply in this problem.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
geogeek
  • 93
  • 6

2 Answers2

3

This seems to be the Set-Cover Problem, which is known to be NP complete. You are therefore very unlikely to find an algorithm that both has reasonable running time and solves the problem optimally.

Fortunately, there are still some things to do:

  1. The greedy algorithm - whereby at each step you choose the set covering the largest number of elements not yet covered by a previously chosen set - runs in polynomial time, and is at most logarithmically times the optimal solution. It is very likely that this is optimal, in some sense.

  2. If each element appears in at most $k$ sets, then there is an efficient algorithm that is at most $k$ times the optimal solution. This is better than the previous solution when $k < \log(n)$, where $n$ is the number of elements.

Ami Tavory
  • 4,410
  • 11
  • 20
  • Can you clarify how the example in the question is a set-cover problem? It appears the universe is $U=\left\{x, y, z\right\}$ and the collection of sets is $S=\left\{\left\{x,y,z\right\}, \left\{x,y\right\}, \left\{x\right\}, \left\{z\right\}\right\}$. The solution to the example is $x$ and $z$. The union of $x$ and $z$ does not equal the universe $U$. – Nat Jun 01 '18 at 14:44
  • I think you've misunderstood the question. Aside from some curious choices about active and passive tenses in the phrasing, the question is exactly set cover, "So the minimum number of groups that could cover the whole data set are 2" shows this, and the union of $x$ and $z$ is indeed $U$, as you can see from the start of the question ("a belongs to the classes..."). – Ami Tavory Jun 01 '18 at 15:17
  • The union of $\left\{a, b, c\right\}$ and $\left\{d\right\}$ is equal to the $\left\{a, b, c, d\right\}$, but the union of $x$ and $z$ is not $\left\{x, y, z\right\}$. Therefore, I was asking that you clarify what is $U$ and what are your sets. – Nat Jun 01 '18 at 15:36
  • @Nat E.g., $x = \{a, b, c\}$ and $U = \{a, b, c, d\}$. – Ami Tavory Jun 01 '18 at 15:39
  • The set cover problem is to identify the smallest sub-collection of $S$. What is the collection of sets $S$ in this example? – Nat Jun 01 '18 at 15:51
  • @Nat Clearly, $x, y, z$ - this is stated in the question. – Ami Tavory Jun 01 '18 at 16:08
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/78314/discussion-between-nat-and-ami-tavory). – Nat Jun 01 '18 at 16:15
-1

You have elements with known class membership. You want the minimum number of classes that includes all elements.

Count how many elements belong to each class (counts X = 3, Y = 2, Z = 2). Class X has the most elements so we will assign all elements that are members of X to class X (elements a, b, c). Remove these elements and count how many of remaining elements belong to each class (counts X = 0, Y = 0, Z = 1). Class Z has the most elements so we will assign all elements that are members of Z to class Z (element d). Repeat until all elements are assigned a class.

However, it is not clear that there is a unique solution to your problem. There may be multiple ways to group the elements that results in the same number of classes.

Breaking ties is also a challenge. One idea would be to randomly break ties and repeat the algorithm several times.

Nat
  • 588
  • 2
  • 9