The lecture is very good and very interactive too (!). Prof Still proposes the denomination "Interactive learning" to differentiate it from active learning. The difference is that there is real feedback from the learner and it has the capacity to change the probability distributions that underlies the mechanism of data generation of the world (in active learning this probability does not change). The motivation is to create a model of the world that generates good (optimal) predictions with minimal information. The action policy is not done to maximize reward, food or energy, but prediction. This is a learning which goal is to make the world predictable with the simplest policy (
i.e. the reward is the good prediction) but without long-term action planning. The model predictive ability is measured by the mutual information that the internal state, in the presence of the action, contains about the future. The goal can be expressed as:
max (I[{s,a};z] - lambda*I[s; h] - mu*I[a;h])
s: states, a: the actions, z: the future observations and h: the history. The lambda and mu terms express the need to reduce model complexity and polices respectively in the goal.
Juan F Gomez-Molina, Intl Group of Neuroscience