PubMed

Graph Task Level: Graph

Description

The PubMed dataset is a citation network dataset consisting of scientific publications from PubMed. Each node represents a scientific publication and edges represent citation relationships.

Dataset Overview

Key Numbers

2,708Nodes
9,856Total Cells
3Max Cell Dimension

Domain Statistics

Domain0-cell1-cell2-cell3-cellHyperedges
Cellular2,7085,2782,64800
Simplicial2,7085,2781,6302200
Hypergraph2,7080002,708

Lifting Methods

Structural-based Liftings
Feature Lifting
  • Projected sum

Model Performance

ModelAccuracy (%)Std Dev (±)
GIN87.211.89
GCN87.090.20
GAT86.710.95
UniGNN286.970.88
EDGNN87.061.09
AST88.920.44
CWN86.321.38
CCCN87.441.28

Key Insights

  • AST achieves the best performance with 88.92% accuracy
  • Most models perform consistently well, with accuracies above 86%
  • GIN shows relatively high variability (±1.89)
  • GCN shows the most stable results with lowest std dev (±0.20)
Reproducibility
python -m topobench model=graph/gin dataset=graph/cocitation_pubmed optimizer.parameters.lr=0.001 model.feature_encoder.out_channels=64 model.backbone.num_layers=2 model.feature_encoder.proj_dropout=0.5 dataset.dataloader_params.batch_size=1 dataset.split_params.data_seed=0,3,5,7,9 trainer.max_epochs=500 trainer.min_epochs=50 trainer.check_val_every_n_epoch=1 callbacks.early_stopping.patience=50