Implementation of MTCNN Architecture on Different Pixel Dimension Classes and Plotting Multi-Face on Detection Results
Main Article Content
Abstract
Face detection is a computer vision task to identify and verify a person based on a photo of their face. Face detection and alignment in unconstrained environments are very challenging due to various poses, illumination, and occlusions. The human face is difficult to model because there are many variables that can change, such as facial expression, orientation, lighting conditions, and partial occlusions, such as sunglasses, scarves, masks, and others. Recent studies have shown that deep learning approaches can achieve impressive performance on these two tasks. In this paper, face detection on multi-faces will be carried out as well as mapping one by one the results of the face detection obtained (face crop) for the needs of various systems related to face detection using the Multi-Task Cascaded Convolutional Neural Network (MTCNN) approach. This study aims to implement the MTCNN architecture using TensorFlow and OpenCV, with two main benefits. First, this study is expected to provide a pre-training model that performs optimally and strengthens evidence from previous studies that have examined this model. Second, this model can be used as input for other systems. The input variable is a photo image of a face containing one or more to be processed. This photo image will have various pixel dimensions to represent different resolutions. The output variable produced is in the form of coordinates of the detected face location or in the form of landmarks of key facial points, such as the position of the eyes, the corner of the nose, and the mouth. The results of the study showed an average score on various pixel dimensions in the dataset, with an accuracy of 93%, a precision of 95%, a recall of 96%, an F1-score of 95%, and an ROC-AUC of 90.89%.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The proposed policy for journals that offer open access
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- Author grant the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
References
Chen, X., Luo, X., Liu, X., & Fang, J. (2019). Eyes Localization Algorithm Based on Prior MTCNN Face Detection. Itaic, 1763–1767.
Du, J. (2020). High-Precision Portrait Classification Based on MTCNN and Its Application on Similarity Judgement. Journal of Physics: Conference Series, 1518(1). https://doi.org/10.1088/1742-6596/1518/1/012066
Ranjan, R., Patel, V. M., & Chellappa, R. (2019). HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 121–135. https://doi.org/10.1109/TPAMI.2017.2781233
Ravidas, S., & Ansari, M. A. (2018). Multi View Face Detection using Deep Learning. Ii.
Thohari, A., & Hertantyo, G. B. (2018). Implementasi Convolutional Neural Network untuk Klasifikasi Pembalap MotoGP Berbasis GPU. Proceedings on Conference on Electrical Engineering, Telematics, Industrial Technology, and Creative Media, 50–55.
Viola, P., & Jones, M. (2001). Managing work role performance: Challenges for twenty-first century organizations and their employees. Rapid Object Detection Using a Boosted Cascade of Simple Features, 511–518.
Xiang, J., & Zhu, G. (2017). Joint Face detection and Facial Expression Recognition with MTCNN. 424–427. https://doi.org/10.1109/ICISCE.2017.95
Zhang, K., Zhang, Z., Li, Z., Member, S., Qiao, Y., & Member, S. (n.d.). Joint Face Detection and Alignment using Multi - task Cascaded Convolutional Networks. 1, 1–5.
Zhao, F., Li, J., Zhang, L., Li, Z., & Na, S. G. (2020). Multi-view face recognition using deep neural networks. Future Generation Computer Systems, 111, 375–380. https://doi.org/10.1016/j.future.2020.05.002