Identifying an object of interest, grasping it, and handing it over are key capabilities of collaborative robots. In this context we propose a fast, supervised learning framework for learning associations between human hand gestures and the intended robotic manipulation actions. This framework enables the robot to learn associations on the fly while performing a task with the user. We consider a domestic scenario of assembling a kid's table where the role of the robot is to assist the user. To facilitate the collaboration we incorporate the robot's gaze into the framework. The proposed approach is evaluated in simulation as well as in a real environment. We study the effect of accurate gesture detection on the number of interactions required to complete the task. Moreover, our quantitative analysis shows how purposeful gaze can significantly reduce the amount of time required to achieve the goal.