Sevina Model Webeweb Set 45rar Exclusive [upd] Here
| | Description | |------------|-----------------| | Scale | 45 million distinct web pages (≈ 1 TB of raw HTML + assets) | | Modalities | HTML DOM tree, rendered screenshots, CSS style sheets, JavaScript execution traces | | Annotations | 1) Relevance judgments for 10 M query‑page pairs (content retrieval) 2) Click‑stream sequences for next‑page recommendation 3) Multi‑label semantic tags (≈ 2 500 categories) | | Diversity | News, e‑commerce, social media, scholarly portals, governmental sites |