Knows What It Knows: A Framework For Self-Aware Learning (2008)

Authors

Abstract

We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning problems. We catalog several KWIK-learnable classes and list some open problems in this area.

Discussion

William Cohen, 2008/07/06 09:48

Here's another set of open problems: I understand that the definitions of KWIK learning vs MB and PAC are different, but can you find witnesses that separate the definitions - eg - a problem that is KWIK learnable that's not MB learnable? - a problem that is MB learnable that's not KWIK learnable? If nothing else this would help clarify the idea.

BTW, in the deterministic case, what is the probability taken over for delta? isn't everything adversarial?

Michael Littman, 2008/07/07 08:52

Anything that is KWIK learnable is also MB learnable. Specifically, whenever a KWIK learnable says “I don't know” it could just guess, so the number of I don't knows upper bounds the number of mistakes. A problem that separates them is conjunctions, which is MB learnable in polynomial time in the number of variables, but has an exponential lower bound in the KWIK setting. There is just not enough feedback in the negatives examples in conjunctions to get much insight into the hidden information.

Another well known problem that is not KWIK learnable is a linear threshold (the sort of thing a perceptron) cares about. Here, the problem is that an adversary could keep presenting examples closer and closer to the threshold so the learner continues to be unsure about which side the example is on.

So, ultimately, we get a containment chain from KWIK to MB to PAC.

Lihong Li, 2008/07/08 03:42

Regarding the delta parameter in deterministic problems: we still need it in this case as we allow randomized algorithms. Such algorithms may output “I don't know” or valid predictions based on its internal random number generator. Thus probability can be taken over this source or randomness.

Michael Littman, 2008/07/08 08:39

True, but in all the simple deterministic problems we've looked at, the delta parameter is, in fact, not needed. For that matter, neither is epsilon.

John Langford, 2008/07/09 01:00

We had a little bit of discussion at the poster about making the setting work agnostically, which is possible in the active learning setting. A key difficulty is that the adversary is too strong, in the same way that the adversary is too strong in online learning with a VC space against an adversary. The two techniques we know of to deal with this are to (a) switch to an IID adversary or (b) switch to a transductive setting.

Michael Littman, 2008/07/09 09:08

I understand how an IID adversary would help (and would not be helpful in an RL setting). Can you say more about how transductive helps? In particular, one could view the learning of a transition function in RL as transductive in that we have both labeled (state-action pairs that have been tried) and unlabeled (state-action pairs for visited states for which an action has not been tried) examples.

John Langford, 2008/07/12 03:50

I think this paper is the closest to what you want: http://ttic.uchicago.edu/~sham/papers/ml/online2batch.pdf You might also be interested in the transductive IID case, for which the PAC-MDL bound we worked on is of some use.

Enter your comment (wiki syntax is allowed):
BBLHU
 
paper/2008/627.txt · Last modified: 2009/05/24 18:48 (external edit)
 
Driven by DokuWiki