Predicting Software Size and Effort From Code Using Natural Language Processing

dc.contributor.author Tenekeci, S.
dc.contributor.author Ünlü, H.
dc.contributor.author Dikenelli, E.
dc.contributor.author Selçuk, U.
dc.contributor.author Kılınç Soylu, G.
dc.contributor.author Demirörs, O.
dc.date.accessioned 2025-02-25T19:31:40Z
dc.date.available 2025-02-25T19:31:40Z
dc.date.issued 2024
dc.description.abstract Software Size Measurement (SSM) holds a crucial role in software project management by facilitating the acquisition of software size, which serves as the primary input for development effort and schedule estimation. However, many small and medium-sized companies encounter challenges in conducting objective SSM and Software Effort Estimation (SEE) due to resource constraints and a lack of expert workforce. This often leads to inaccurate estimates and projects exceeding planned time and budget. Hence, organizations need to perform objective SSM and SEE with minimal resources and without relying on an expert workforce. In this research, we introduce two exploratory case studies aimed at predicting the functional size (COSMIC and Event-based size) and effort of software projects from the code using a deep-learning-based NLP model: CodeBERT. For this purpose, we collected and annotated two datasets consisting of 4800 Python and 1100 C# functions. Then, we trained a classification model to predict COSMIC data movements (entry, exit, read, write) and four regression models to predict Event-based size (interaction, communication, process) and effort. Despite utilizing a relatively small dataset for model training, we achieved promising results with an 84.5% accuracy for the COSMIC size, 0.13 normalized mean absolute error (NMAE) for the Event-based size, and 0.18 NMAE for the effort. These findings are particularly insightful as they demonstrate the practical utility of language models in SSM and SEE. © 2024 Copyright for this paper by its authors. en_US
dc.identifier.issn 1613-0073
dc.identifier.scopus 2-s2.0-85212684670
dc.identifier.uri https://hdl.handle.net/20.500.14365/5940
dc.language.iso en en_US
dc.publisher CEUR-WS en_US
dc.relation.ispartof CEUR Workshop Proceedings -- Joint of the 33rd International Workshop on Software Measurement and the 18th International Conference on Software Process and Product Measurement, IWSM-MENSURA 2024 -- 30 September 2024 through 4 October 2024 -- Montreal -- 204467 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Artificial Intelligence en_US
dc.subject Effort Estimation en_US
dc.subject Natural Language Processing en_US
dc.subject Software Size Measurement en_US
dc.title Predicting Software Size and Effort From Code Using Natural Language Processing en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.scopusid 57340107000
gdc.author.scopusid 57521977500
gdc.author.scopusid 59481631600
gdc.author.scopusid 59481946500
gdc.author.scopusid 55811008000
gdc.author.scopusid 55949165100
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department İzmir Ekonomi Üniversitesi en_US
gdc.description.departmenttemp Tenekeci S., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Ünlü H., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Dikenelli E., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Selçuk U., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Kılınç Soylu G., İzmir University of Economics, Balçova, İzmir, 35330, Turkey; Demirörs O., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q4
gdc.description.volume 3852 en_US
gdc.description.wosquality N/A
gdc.index.type Scopus
gdc.scopus.citedcount 3
gdc.virtual.author Kılınç Soylu, Görkem
relation.isAuthorOfPublication 6910c1d9-8e21-4258-bf36-21cce487c9ea
relation.isAuthorOfPublication.latestForDiscovery 6910c1d9-8e21-4258-bf36-21cce487c9ea
relation.isOrgUnitOfPublication e9e77e3e-bc94-40a7-9b24-b807b2cd0319
relation.isOrgUnitOfPublication b4714bc5-c5ae-478f-b962-b7204c948b70
relation.isOrgUnitOfPublication 26a7372c-1a5e-42d9-90b6-a3f7d14cad44
relation.isOrgUnitOfPublication.latestForDiscovery e9e77e3e-bc94-40a7-9b24-b807b2cd0319

Files