SRC presentation for Progress Report for 1 st Semester 2022 By Farah Jamal Ansari Under the supervision of Prof. Sumeet Agarwal & Prof. M. Hanmandlu.
Entropy based Tracker using Competitive-Cooperative Learning Model.
Motivation. To introduce a new tracker parallel to the Kalman filter for the purpose of road sign tracking. The tracker introduced can be used for any purpose. Road sign tracking reduces false negatives and leads to a more robust detection..
Abstract. The tracker is intended to take cognizance of the road signs installed on the roadsides for guiding the vehicle drivers. Each of the four points of a bounding box, viz., xmin , ymin , xmax , ymax , enclosing the detected signboard is tracked separately. In this semester, we propose a multi object detection based learning based tracker using a competitive-cooperative learning model(CCLM) proposed in [1 ]. The scenario is that we have some detected bounding boxes in a frame. Our task is to track these road signs in the corresponding next frames till the road sign becomes unavailable. The tracker is also compared with the Deep Sort algorithm on the Belgium road signs and has been found to be better..
Cont... Fx1=( x1(Frame- 2 ) -x1(p)) Fx2=(x2 (Frame- 2 ) -x2(p)) Fy1=(y1 (Frame- 2 ) -y1(p)) Fy2=(y2 (Frame- 2 ) -y2(p )) Where x1(Frame- 2 ) is the detected bounding box in the current frame and x1(p),x2(p),y1(p)and y2(p) are the preditected bounding box from previous frame. This difference for each coordinate is the objective function which we minimise using CCLM.
CCLM. abstract. . .
CCLM. predict correct. Prediction:. Let (x1, y1) and (x2,y2) be the diagnonally opposite points in the bounding box of the detected sign board in the first frame. Let (vx1, vy1) and (vx2,vy2) be the velocities such that we have the predicted values as: x1(p )= x1+vx1 y1(p)= y1+vy1 x2(p)= x2+vx2 y2(p)= y2+vy2 eq1 Initially we assume vx1, vy2 ,vx2,vy2 as a random matrix representing 4 efforts and 40 problem solvers..
Contd... Correction:. Where in eq .1 we put Bk1=vx1(k), Bk2=vy1(k), Bk3=vx2(k), Bk4=vy2(k) x1(p)= x1(Frame-1)+ Bk1 y1(p)= y1(Frame-1)+ Bk2 x2(p)= x2(Frame-1)+ Bk3 y2(p)= y2(Frame-1)+ Bk4 Let the detection coordinates in the second frame be x1(Frame- 2 ), y1(Frame- 2 ), x2(Frame- 2 ), y2(Frame- 2 ) We take the absolute error for each of the coordinates as the objective function which we minimize in our case Fx1=( x1(Frame- 2 ) -x1(p)) Fx2=(x2 (Frame- 2 ) -x2(p)) Fy1=(y1 (Frame- 2 ) -y1(p )) Fy2=(y2 (Frame- 2 ) -y2(p)).
Contd... .
Contd... As discussed above, we find values for the efforts and construct the predicted box for the n ext frame as x1(p)= x1+ Bk1 (BK1 is negative if x1(Frame-2)-x1(Frame-1)is negative) y1(p)= y1+Bk2 (BK2 is negative if y1(Frame-2)-y1(Frame-1)is negative) x2(p)= x2+ Bk3 (BK3 is negative if x2(Frame-2)-x2(Frame-1)is negative) y2(p)= y2+Bk4 (BK4 is negative if y2(Frame-2)-y2(Frame-1)is negative) eq 4 In the next frame x1, y1,x2,y2 are replaced with x1(p), x2(p), y1(p)and y2(p) Vx1,vx2,vx3,vx4 are replaced with Bk1, Bk2,Bk3,Bk4 . The same procedure is repeated unless the track is deleted..
Flow Chart for CCLM.
Road sign Tracking using CCLM. 01 OF TRACKS Track Using Correlation Create New Tracks.
Frame work of the tracker. Unassociated track : It arises whenever there is a track with a missed detection . Age: This variable is incremented whenever there is missed detection in the new frame. A soon as there is detection in the next frame , this is gain set to 0. If this age variable is more than 7, then the track is deleted. New track : It is created whenever there is an unassociated detection..
Road Sign Association. IOU matching : The detections are associated with the tracks using IOU matching If the detection is not matched with any track then we call it a case of an unassociated measurement and we create a new track for it. If a track does not find a matching detection in the current frame, then it is said to be occluded. In some cases the road sign is not detected due to poor illumination conditions and then we find a corresponding match for the track using correlation..
Correlation. Bounding Box. In the next frame Increase the size of the bounding box.
Correlation. To find the matching window , the normalized correlation coefficient is used. The window having the highest correlation coefficient (r) corresponds to the left most corner coordinates (x, y) of the window. The correlation coefficient is computed from . eq 5 We subtract the mean from the pixel intensities to reduce the effect of illumination . R is the template image or the predicted box and S is the search area or the neighborhood area in which we search the predicted box by sliding a window across it. x and y are the pixel coordinates in the search image which are constantly sliding. x’ and y’ are the pixel coordinates to be correlated in the template image and the sliding window..
Correlation. . . T1. T2.
Recognition. At the end of tracking a lot of false positives are introduced. To reduce the false positives, we take recourse to the recognition phase . A two-stage multi-scale CNN as described in [ 3 ] is preferred for this purpose. Tensorflow 2.0 is accessed for training and testing. Images are resized to 32×32. As shown in [ 3 ], gray scale images in the YUV model give the best accuracy, therefore the U and V parts are discarded and the Y component is only used..
Architecture. 32 X 32 Image. CONV. Fully Connected.
Deep Sort. Nicolai et al [ 4 ] developed a multi object tracking framework from one frame to another using recursive Kalman filter along with data association . Nicolai et al [ 14]. described tracking on an 8-dimensional state space ( u,v,χ,h,u’,v’,χ’,h ’) representing a bounding box with centre position (u, v), aspect ratio χ, height h and the rest are the respective velocities associated with each coordinate . The bounding box coordinates ( u,v,χ , h) are passed on as observations in a Kalman filter, which is based on the constant velocity linear model. The Kalman filter tracks the new bounding boxes, and associates new detections with new predictions ..
Results. In the BelgiumTs [2] dataset there are four video sequences taken by 8 cameras. Each video sequence taken by 8 cameras consists of 3001 frames. The annotations have not been provided for all the frames. Therefore I have created manual annotations for the same. The results which I present here are based on my annotations. I have selected the sequence 01 taken by camera 01 for the purpose of demonstrating the results..
Detection/Tracker Results For seq 01. Precision Recall F-score AUC Avg. Time per frame in secs Yolov3 Detector 0.70439 0.6764 0.6901 0.6428 1.6 CCLM Tracker 0.665 0.735 0.6986 0.6609 1.64 Deep Sort 0.657 0.6828 0.67 0.5788 1.6001.
AUC_PR for the detection. Precision-Recall Curve 100 095 090 085 075 070 065 060 0.0 Detection Deep Sort 0.1 0.2 0.3 0.4 0.5 06 0.7.
Detection/Tracker R ecognition results for seq01.
False positives while detection. OC0618N OC0392N OD0619N 000444N 000423N 000362N OOD62CN 003421N 000278N C00621N 000419N 000438N 000417N 000239N N30N 000413N 000223N COOQ8N CC0219N 000513N OOC427N 000402N 000190N.
False Negatives while detection. 33541 _000062N 33546 000071N 33550 000080N 33558_000089N _33542_000063N 33546 000072N 33551 000081N _33559_000090N _33542_000064N 33547 000073N 33551 000082N _33560_000091 N _33S43_00006SN 33547 000074N 33552 000083N _33561 _000092N _33543_000066N 33548 000075N 33553 000084N _33562_000093N _33544_000067N 33548 000076N 33554 000085N _33563_000094N _33S44_000068N 33549 000077N 33555 000086N a _33S63_00009SN _3354S_000069N 33549 000078N 33556 000087N _33564_000096N.
Conclusions. In this report, the learning based tracker based CCLM is proposed. For the purpose of detection we have used the YOLOv3 detector as it is quite fast and is able to detect most of the road signs with very less number of false positives. Since our tracker is detection based, it is not able to track an unassociated track. In this scenario we opt for the normalized cross correlation to track the missing road signs. The Recognition phase identifies the road sign into the correct class. It also reduces the number of false positives and increases the F-Score value for both detection and tracking. We have also implemented the Deep Sort tracker incorporating K alman filter for comparison with the CCLM tracker and performance of the latter is found to be better..
Future Work. As the Belgium dataset does not contain full annotation, I created my manual annotations. CURE -TSD dataset [5] utilizes the Belgium Sequences with its own annotations. Therefore for the future, we intend to work on the CURE-TSD dataset..
References. [1] Grover , J. and Hanmandlu , M., 2021. Novel competitive-cooperative learning models ( cclms ) based on higher order information sets. Applied Intelligence , 51(3), pp.1513-1530 [2] Timofte , Radu , et al. ”Combining traffic sign detection with 3D tracking towards better driver assistance.” Emerging topics in computer vision and its applications . 2012. 425-446.Song , S., Li, Y., Huang, Q. and Li, G., 2021. A New Real-Time Detection and Tracking Method in Videos for Small Target Traffic Signs. Applied Sciences , 11(7 ), p.3061 . [3] Sermanet , Pierre, and Yann LeCun . ”Traffic sign recognition with multi-scale convolutional networks.” The 2011 International Joint Conference onNeural Networks. IEEE, 2011. [4] Wojke , Nicolai, Alex Bewley , and Dietrich Paulus. ”Simple online and realtime tracking with a deep association metric.” 2017 IEEE international conference on image processing (ICIP). IEEE, 2017 [5] Temel , D., Chen, M.H. and AlRegib , G., 2019. Traffic sign detection under challenging conditions: A deeper look into performance variations and spectral characteristics. IEEE Transactions on Intelligent Transportation Systems , 21 (9), pp.3663-3673..