https://github.com/jiucai233/DSL13thEnterpriseProject
<aside>
This project implements a YOLOv8 object detection model for identifying objects in video and image data.
Application circumstance: For the food delivery company, it will be helpful having a food picture right before the lid closes for the customer services. So this model is being finetuned for capture the food picture and video.
</aside>
<aside>
As you may see: this is conducted by the Data Science Lab @Yonsei University. There are 2 teams to do the same topic project. And our team has 6 people.
The topic is from @BlitzDynamics, a company from South Korea, our team aims to develop an AI model (object detection) for the task. They asked us to find a light-weight model while don’t need a lot of computing resources
</aside>
<aside>
I firstly was in charge of test the YOLOv8n model for the task. Luckily this model got the best performance among the models.
Our source is the video from the monitoring camera on the top of the kitchen. Which can observe the whole process of the food preparation
And I did 6 hour data labeling, divided the box into 4 categories:
data labeling:

capture of the video:

Train detail (the actual code can be found in the Github):

The whole train is conducted in local CPU, Took 3 hours. mAP50-95 is decent.
After the model came out, fortunately the model can distinguish the lid is close or open, so we decided-
based on the the moment of label (closed box → box) changes, collect the 10 frame before and after this.
But the problem is:
So the solution is:
set a time & confidence level of the label. e.g. the “closed delivery box” label have to over 0.75 confidence and keep emerging over 10 frame.
read the fps of the video, and set this as a parameter in the code:
pre_frames = max(1, math.floor(fps * pre_sec)) post_frames = max(1, math.floor(fps * post_sec)) ... start = max(0, idx - pre_frames) end = min(len(frames) - 1, idx + post_frames) raw_video_path = os.path.join(raw_dir, 'clip_raw.mp4') writer = cv2.VideoWriter(raw_video_path, fourcc, fps, (w, h)) for f in frames[start:end+1]: writer.write(f) writer.release()while the fps is detected by code and the
pre_secis a parameter which can be changed by user
I created the Github repo and attended the company meeting, presented the test example

Result of id7 food with boxes

picture without boxes

result for food with boxes
</aside>
<aside>
First time experiencing object detection and vision model finetuning, and data labeling is soo tired. But we got a pretty good result. As for my last scheduled project in DSL, it’s a perfect farewell.
Big thanks to all my teammates.

</aside>