Introduction

Welcome to the scrape-rs documentation. scrape-rs is a high-performance, cross-platform HTML parsing library with a pure Rust core and bindings for Python, Node.js, and WASM.

Why scrape-rs?

scrape-rs is designed to be 10-50x faster than popular HTML parsing libraries while maintaining a consistent, idiomatic API across all platforms.

Key Features

  • Blazing Fast: Built on html5ever with SIMD-accelerated text processing
  • Cross-Platform: Identical API for Rust, Python, Node.js, and WASM
  • Memory Efficient: Arena-based DOM allocation with minimal overhead
  • Spec-Compliant: Full HTML5 parsing with comprehensive CSS selector support
  • Modern: Support for streaming parsing, compiled selectors, and parallel processing

Performance Highlights

OperationBeautifulSoupCheerioscrape-rsSpeedup
Parse 1KB HTML0.23ms0.18ms0.024ms9.7-7.5x
Parse 100KB HTML18ms12ms1.8ms10-6.7x
CSS selector query0.80ms0.12ms0.006ms133-20x
Extract all links3.2ms0.85ms0.18ms17.8-4.7x

Quick Example

Rust

#![allow(unused)]
fn main() {
use scrape_core::Soup;

let html = r#"<div class="product"><h2>Widget</h2><span class="price">$19.99</span></div>"#;
let soup = Soup::parse(html);

let product = soup.find(".product")?.expect("product not found");
let name = product.find("h2")?.expect("name not found").text();
let price = product.find(".price")?.expect("price not found").text();

println!("{}: {}", name, price);
}

Python

from scrape_rs import Soup

html = '<div class="product"><h2>Widget</h2><span class="price">$19.99</span></div>'
soup = Soup(html)

product = soup.find(".product")
name = product.find("h2").text
price = product.find(".price").text

print(f"{name}: {price}")

Node.js

import { Soup } from '@scrape-rs/scrape';

const html = '<div class="product"><h2>Widget</h2><span class="price">$19.99</span></div>';
const soup = new Soup(html);

const product = soup.find(".product");
const name = product.find("h2").text;
const price = product.find(".price").text;

console.log(`${name}: ${price}`);

WASM

import init, { Soup } from '@scrape-rs/wasm';

await init();

const html = '<div class="product"><h2>Widget</h2><span class="price">$19.99</span></div>';
const soup = new Soup(html);

const product = soup.find(".product");
const name = product.find("h2").text;
const price = product.find(".price").text;

console.log(`${name}: ${price}`);

Where to Go Next

Platform Support

PlatformStatusPackage
RustStablescrape-core
Python 3.10+Stablefast-scrape
Node.js 18+Stable@scrape-rs/scrape
WASMStable@scrape-rs/wasm

License

scrape-rs is dual-licensed under Apache 2.0 and MIT. See LICENSE-APACHE and LICENSE-MIT for details.