Capturing human motion or objects by vision technology has been intensively studied. Although humans interact very often with other persons or objects, most of the previous work has focused on capturing a single object or the motion of a single person. In this talk, I will highlight four projects that deal with human-human or human-object interactions. The first project addresses the problem of capturing skeleton and non-articulated cloth motion of two interacting characters. The second project aims to model spatial hand-object relations during object manipulation. In the third project, an affordance detector is learned from human-object interactions. The fourth project investigates how human motion can be exploited for object discovery from depth video streams.