Speaker
Description
Research software increasingly depends on structured code representations, yet publicly available graph-regression benchmarks remain concentrated in domains such as chemistry and offer limited support for studying software-centric, execution-aware graphs. We present RelSC, an open benchmark for software performance prediction that converts Java programs into graph representations and pairs them with measured execution-time labels. RelSC is released in two complementary variants: RelSC-H, a homogeneous flow-augmented AST representation, and RelSC-M, a multi-relational version that preserves semantically distinct relationships in program structure.
Beyond the dataset itself, we release a reproducible construction pipeline, standardized train/validation/test splits, and ready-to-use PyTorch Geometric objects, enabling consistent reuse and comparison across studies. We benchmark source-code, AST-based, homogeneous GNN, and heterogeneous GNN baselines on two corpora that reflect both real-world build variability and controlled execution settings. Our results show that semantic augmentation of ASTs with control-flow and data-flow information substantially improves prediction quality, while richer multi-relational structure introduces additional robustness challenges, especially in smaller projects.
We position RelSC not only as a machine learning benchmark, but as research software infrastructure: a reusable and extensible resource for reproducible evaluation of program-analysis methods, performance regression studies, and future tools for CI/CD, code optimization, and performance-aware scheduling.