PubMed
Description
The PubMed dataset is a citation network dataset consisting of scientific publications from PubMed. Each node represents a scientific publication and edges represent citation relationships.
Dataset Overview
Key Numbers
2,708Nodes
9,856Total Cells
3Max Cell Dimension
Domain Statistics
Domain | 0-cell | 1-cell | 2-cell | 3-cell | Hyperedges |
---|---|---|---|---|---|
Cellular | 2,708 | 5,278 | 2,648 | 0 | 0 |
Simplicial | 2,708 | 5,278 | 1,630 | 220 | 0 |
Hypergraph | 2,708 | 0 | 0 | 0 | 2,708 |
Lifting Methods
Structural-based Liftings
- Cellular: Cycle-based lifting
- Simplicial: Clique complex lifting
- Hypergraph: k-hop lifting
Feature Lifting
- Projected sum
Model Performance
Model | Accuracy (%) | Std Dev (±) |
---|---|---|
GIN | 87.21 | 1.89 |
GCN | 87.09 | 0.20 |
GAT | 86.71 | 0.95 |
UniGNN2 | 86.97 | 0.88 |
EDGNN | 87.06 | 1.09 |
AST | 88.92 | 0.44 |
CWN | 86.32 | 1.38 |
CCCN | 87.44 | 1.28 |
Key Insights
- AST achieves the best performance with 88.92% accuracy
- Most models perform consistently well, with accuracies above 86%
- GIN shows relatively high variability (±1.89)
- GCN shows the most stable results with lowest std dev (±0.20)
Reproducibility
python -m topobench model=graph/gin dataset=graph/cocitation_pubmed optimizer.parameters.lr=0.001 model.feature_encoder.out_channels=64 model.backbone.num_layers=2 model.feature_encoder.proj_dropout=0.5 dataset.dataloader_params.batch_size=1 dataset.split_params.data_seed=0,3,5,7,9 trainer.max_epochs=500 trainer.min_epochs=50 trainer.check_val_every_n_epoch=1 callbacks.early_stopping.patience=50