KETI AI Research Center. Boeun Kim Senior Researcher.
[Audio] Firstly, I will introduce members of our center, And next, the main research topics will be explained. We are researching the fields of computer vision and natural language processing, as well as multi-modal areas that encompass both fields. -- For vision, we mainly focus on the research areas related to scenes and humans. -- And we will introduce multimodal conversation and reasoning techniques as our main topics in NLP..
Ⅰ. AIRC Members.
[Audio] Our center has 22 members, including 9 computer vision researchers and 9 NLP researchers..
[Audio] Our institute mainly carries out national projects developing practical solutions as well as studying state-of-the-art technology..
[Audio] First, we are studying about scene understanding and reconstructions. We developed an image captioning algorithm that generates multiple sentences explaining the situations in an input image. The Variational autoencoder-based model generates captions related to various regions of the image, using caption attention map. -- The model was trained with Korean version of COCO dataset to generate Korean sentences. 3D reconstruction and localization using SLAM and SFM is one of our main research topic. Both RGB-based and lidar-based SLAM algorithms are studied. The videos show the 3D reconstruction of the building in outdoors. -- The video on the left is for Hyundae department store in pangyo..
[Audio] This technique was also applied for AR services in smart factories. This service supports AR navigation and control services based on reconstructed 3D indoor map and user localization. -- The video in the left demonstrates the AR information service, and the right demonstrates the AR control service..
[Audio] Another major research area concerns human and human motion. The research starts with 2D/ 3D pose estimation. We have tried to improve the pose estimation accuracy in challenging situations. We improved illumination in low-light videos to estimate accurate joint location. And introduced multi-resolution fusion network to make a robust model for distant people. We developed gesture recognition model based on random forest using estimated joint locations. In addition, we are developing a transformer-based pre-training model to learn unlabeled motion sequences in a self-supervised manner..
[Audio] And we also studied about human motion prediction that generate future motion sequences from given past motions. The motion trajectories of all joints are transformed into the spectral domain signals, and the signals pass through the graph convolution network to predict future trajectories. We also developed a motion transfer engine to transfer the motion of the input source video to the target single image. This sample clip shows a person in the target photo mimicking the motion of a sign language video. The followings are examples of visualization and user interaction for commercialization. Videos in the left show AR effect with real-time human pose estimation. And we have created the avatar representing our institute. This avatar is rigged with standard SMPL body model to be animated..
[Audio] From now on, the topics are related to NLP tasks. First, we developed dialogue processing system dealing with multi-modal information. The system analyze visual information, voice, and language to generate utturances. Using TTS, the speech is played along with a suitable motion. In addition, we developed empathetic dialogue engine. The model aims to understand the user intention and emotions, and these factors to set an appropriate dialogue strategy. This is our prototype..
[Audio] We are also studying about back-channeling which is a main element of the attentive listening. Our model anlayzes the user voice and texts to set the appropriate back-channel type and timing. In addition, we are studying about knowledge-grounded dialogue. Most of the text data is in English, therefore, it is hard to train the model with non-dominant language. To overcome this issue, we proposed the network that uses multi-lingual knowledge resources..
[Audio] Reasoning is one of our main research topic. There are questions that cannot be solved with a single retrieval.We developed a multi-hop QA system that divides these questions into multiple queries and solve them sequentially. For example, for the question "Where is the hometown of the singer who sang the theme song of the movie ' Titanic'?" , The system first retrieve the theme song "My heart will go on" and continue to explore the hometown of the singer..
[Audio] This is end of the presentation. KETI AI Center is researching technologies to make AI as a companion that understands the world like humans, and will continue to conduct related research in the future. Thank you for listening..