Architecture¶
Technical overview of rewget's internal architecture.
Project Structure¶
rewget/
├── crates/
│ ├── rewget/ # CLI binary
│ │ ├── src/
│ │ │ ├── main.rs # Entry point
│ │ │ ├── args.rs # Argument parsing
│ │ │ ├── cli.rs # Clap definitions
│ │ │ ├── exec.rs # wget execution
│ │ │ └── daemon.rs # Daemon communication
│ │ └── build.rs # Shell completions, man page
│ │
│ ├── rewgetd/ # Daemon binary
│ │ └── src/
│ │ ├── main.rs # Daemon entry point
│ │ ├── server.rs # IPC server
│ │ ├── impersonate.rs # Stage 2 logic
│ │ └── preflight.rs # Stage 3 logic
│ │
│ └── rewget-core/ # Shared library
│ └── src/
│ ├── lib.rs # Public API
│ ├── config.rs # Configuration types
│ ├── profile.rs # Browser profiles
│ ├── detection.rs # Bot detection patterns
│ ├── cache.rs # Domain stage cache
│ ├── ipc.rs # IPC protocol
│ └── chromium.rs # Chromium management
│
├── documentation/ # MkDocs documentation
├── Formula/ # Homebrew formula
└── scripts/ # Build and install scripts
Component Diagram¶
┌──────────────────────────────────────────────────────────────────┐
│ User │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ rewget CLI │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ args.rs │ │ exec.rs │ │ daemon.rs │ │
│ │ (parsing) │ │ (Stage 1) │ │ (IPC) │ │
│ └─────────────┘ └─────────────┘ └──────┬──────┘ │
└───────────────────────────────────────────┼──────────────────────┘
│ nng IPC
▼
┌──────────────────────────────────────────────────────────────────┐
│ rewgetd Daemon │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ server.rs │ │impersonate │ │ preflight │ │
│ │ (IPC recv) │ │ (Stage 2) │ │ (Stage 3) │ │
│ └─────────────┘ └──────┬──────┘ └──────┬──────┘ │
└──────────────────────────┼────────────────┼──────────────────────┘
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ rquest │ │ chromiumoxide │
│ (TLS emulation) │ │ (headless CDP) │
└───────────────────┘ └───────────────────┘
Data Flow¶
Stage 1: Direct wget¶
rewget args
│
├─ Parse --rewget-* flags
│
├─ Check domain cache
│ │
│ └─ If cached Stage 2/3, skip to that stage
│
├─ Spawn wget subprocess
│ │
│ ├─ Stream stdout/stderr
│ │
│ └─ Wait for exit
│
└─ Check result
│
├─ Success (exit 0) → Done
│
└─ Failure (403, etc.) → Try Stage 2
Stage 2: Impersonation¶
rewget
│
├─ Connect to rewgetd (spawn if needed)
│
├─ Send Stage 2 request
│ {url, profile, timeout, headers}
│
└─ Receive response
│
├─ Success → Write body, Done
│
└─ Failure → Try Stage 3
Stage 3: JavaScript Preflight¶
rewgetd
│
├─ Launch Chromium (download if needed)
│
├─ Navigate to URL
│
├─ Wait for condition
│ │
│ ├─ networkidle
│ ├─ selector match
│ └─ delay timeout
│
├─ Extract cookies + body
│
└─ Return response
IPC Protocol¶
rewget and rewgetd communicate via nng (nanomsg-next-gen) using JSON messages.
Request Format¶
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"command": "stage2",
"url": "https://example.com/path",
"profile": "chrome_131",
"timeout": 15000,
"headers": {
"Accept": "text/html"
},
"output_path": "/tmp/rewget-output.tmp",
"js_wait": null
}
Response Format¶
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"success": true,
"status": 200,
"headers": {
"Content-Type": "text/html",
"Content-Length": "1234"
},
"body_path": "/tmp/rewget-output.tmp",
"cookies": [
{"name": "cf_clearance", "value": "...", "domain": ".example.com"}
],
"error": null
}
Commands¶
| Command | Description |
|---|---|
stage2 | TLS impersonation request |
stage3 | JavaScript preflight request |
status | Health check |
Browser Profile Structure¶
{
"name": "chrome_131",
"version": 1,
"signature": "base64-ed25519-signature",
"browser": {
"name": "Chrome",
"version": "131.0.0.0",
"platform": "Windows",
"user_agent": "Mozilla/5.0..."
},
"tls": {
"versions": ["TLSv1.2", "TLSv1.3"],
"cipher_suites": ["TLS_AES_128_GCM_SHA256", ...],
"extensions": [0, 23, 65281, ...],
"curves": ["x25519", "secp256r1"],
"alpn": ["h2", "http/1.1"],
"grease": true
},
"http2": {
"settings": {
"HEADER_TABLE_SIZE": 65536,
"MAX_CONCURRENT_STREAMS": 1000,
...
},
"window_update": 15663105,
"pseudo_header_order": [":method", ":authority", ":scheme", ":path"]
},
"headers": {
"Accept": "text/html,...",
"Accept-Language": "en-US,en;q=0.9"
}
}
Domain Cache Structure¶
{
"example.com": {
"stage": 2,
"profile": "chrome_131",
"timestamp": 1704067200,
"expires": 1704672000
}
}
Key Dependencies¶
| Crate | Purpose |
|---|---|
rquest | HTTP client with TLS impersonation |
chromiumoxide | Chrome DevTools Protocol client |
nng | IPC transport |
clap | CLI argument parsing |
serde | JSON serialization |
ed25519-dalek | Profile signature verification |
tokio | Async runtime (daemon) |
Platform-Specific Code¶
Unix vs Windows¶
// exec.rs - Process execution
#[cfg(unix)]
fn exec_wget(...) {
// Use exec() syscall - replaces process
cmd.exec()
}
#[cfg(windows)]
fn exec_wget(...) {
// Use spawn + wait - no exec() on Windows
cmd.status()
}
// ipc.rs - Socket paths
#[cfg(unix)]
pub fn socket_path() -> PathBuf {
// ~/.cache/rewget/rewgetd.sock
dirs::runtime_dir().join("rewget/rewgetd.sock")
}
#[cfg(windows)]
pub fn socket_path() -> PathBuf {
// Named pipe
PathBuf::from(r"\\.\pipe\rewget")
}
Chromium Download¶
#[cfg(unix)]
fn download_chromium() {
// Use wget or curl
}
#[cfg(windows)]
fn download_chromium() {
// Use PowerShell Invoke-WebRequest
}
Build System¶
Workspace Structure¶
# Cargo.toml (root)
[workspace]
members = ["crates/*"]
[workspace.package]
version = "1.0.0"
[workspace.dependencies]
# Shared dependencies
Build Script¶
crates/rewget/build.rs generates:
- Shell completions (bash, zsh, fish, PowerShell)
- Man page (rewget.1)
Release Profile¶
Testing¶
# Run all tests
cargo test
# Run specific crate tests
cargo test -p rewget-core
cargo test -p rewgetd
# Run with output
cargo test -- --nocapture
Extending rewget¶
Adding a New Profile¶
- Create JSON file in
~/.local/share/rewget/profiles/ - Follow the profile structure above
- Test with
--rewget-verify-profile=name
Adding a New Detection Pattern¶
Edit crates/rewget-core/src/detection.rs:
Adding a New Command¶
- Add variant to
Commandenum inargs.rs - Add parsing logic in
Args::parse() - Add handler in
main.rs