Internals: RBF Weight Building System
This document explains the code flow in the solve system, which builds sparse weight matrices for RBF operators.
Architecture Overview
The solve system is organized into three distinct layers:
src/solve/
├── types.jl # Layer 0: Shared data structures & traits
├── stencil_math.jl # Layer 1: Pure mathematical operations
├── kernel_exec.jl # Layer 2: Parallel execution & allocation
└── api.jl # Layer 3: Entry points & routingLayer Responsibilities
Layer 0: types.jl - Shared Data Structures
Boundary condition types (
BoundaryCondition,HermiteStencilData)Stencil classification types (
InteriorStencil,DirichletStencil,HermiteStencil)Operator arity traits (
SingleOperator,MultiOperator{N})
Layer 1: stencil_math.jl - Pure Mathematics
Collocation matrix building (
_build_collocation_matrix!)RHS vector building (
_build_rhs!)Stencil assembly (
_build_stencil!)Hermite boundary modifications (dispatch-based)
No I/O, no parallelism, fully testable
Layer 2: kernel_exec.jl - Parallel Execution
Memory allocation (
allocate_sparse_arrays)Kernel launching (
launch_kernel!)Batch processing and synchronization
Sparse matrix construction
Layer 3: api.jl - Entry Points
Public
_build_weightsfunctionsRouting (standard vs Hermite paths)
Operator application to basis functions
System Flow
The system builds sparse weight matrices using exact allocation:
Interior points get k entries (full stencil)
Dirichlet boundary points get 1 entry (identity row)
Other boundary points get k entries (Hermite stencil)
User Code
↓
api.jl: Route based on boundary conditions
↓
kernel_exec.jl: Allocate memory & launch parallel kernel
↓
stencil_math.jl: Build collocation matrix A, RHS b, solve A\b
↓
kernel_exec.jl: Construct sparse matrix
↓
Return to userPart 1: Entry Points (api.jl)
Main Entry from Operators
function _build_weights(ℒ, op)
# Extract configuration from operator
data = op.data
eval_points = op.eval_points
adjl = op.adjl
basis = op.basis
return _build_weights(ℒ, data, eval_points, adjl, basis)
endApply Operator to Basis
function _build_weights(ℒ, data, eval_points, adjl, basis)
dim = length(first(data))
# Build monomial basis and apply operator
mon = MonomialBasis(dim, basis.poly_deg)
ℒmon = ℒ(mon)
ℒrbf = ℒ(basis)
return _build_weights(data, eval_points, adjl, basis, ℒrbf, ℒmon, mon)
endInterior-Only Path (No Boundary Conditions)
function _build_weights(
data, eval_points, adjl, basis, ℒrbf, ℒmon, mon;
batch_size=10, device=CPU()
)
# Create empty boundary data for interior-only case
TD = eltype(first(data))
is_boundary = fill(false, length(data))
boundary_conditions = BoundaryCondition{TD}[]
normals = similar(data, 0)
boundary_data = BoundaryData(is_boundary, boundary_conditions, normals)
return build_weights_kernel(
data, eval_points, adjl,
basis, ℒrbf, ℒmon, mon,
boundary_data;
batch_size=batch_size, device=device
)
endHermite Path (With Boundary Conditions)
function _build_weights(
data, eval_points, adjl, basis, ℒrbf, ℒmon, mon,
is_boundary, boundary_conditions, normals;
batch_size=10, device=CPU()
)
boundary_data = BoundaryData(is_boundary, boundary_conditions, normals)
return build_weights_kernel(
data, eval_points, adjl,
basis, ℒrbf, ℒmon, mon,
boundary_data;
batch_size=batch_size, device=device
)
endPart 2: Kernel Orchestration (kernel_exec.jl)
Main Orchestrator
function build_weights_kernel(
data, eval_points, adjl, basis, ℒrbf, ℒmon, mon,
boundary_data; batch_size=10, device=CPU()
)
# Extract dimensions
TD = eltype(first(data))
k = length(first(adjl))
nmon = binomial(length(first(data)) + basis.poly_deg, basis.poly_deg)
num_ops = _num_ops(ℒrbf)
N_eval = length(eval_points)
# Allocate sparse arrays with exact counting
I, J, V, row_offsets = allocate_sparse_arrays(
TD, k, N_eval, num_ops, adjl, boundary_data
)
# Launch parallel kernel
n_batches = ceil(Int, N_eval / batch_size)
launch_kernel!(
I, J, V, data, eval_points, adjl,
basis, ℒrbf, ℒmon, mon, boundary_data, row_offsets,
batch_size, N_eval, n_batches, k, nmon, num_ops, device
)
# Construct sparse matrix
nrows, ncols = N_eval, length(data)
if num_ops == 1
return sparse(I, J, V[:, 1], nrows, ncols)
else
return ntuple(i -> sparse(I, J, V[:, i], nrows, ncols), num_ops)
end
endMemory Allocation
function allocate_sparse_arrays(TD, k, N_eval, num_ops, adjl, boundary_data)
# Count exact non-zeros:
# - Interior points: k entries (full stencil)
# - Dirichlet points: 1 entry (identity row)
# - Other boundary points: k entries (Hermite stencil)
total_nnz, row_offsets = count_nonzeros(
adjl, boundary_data.is_boundary, boundary_data.boundary_conditions
)
I = Vector{Int}(undef, total_nnz)
J = Vector{Int}(undef, total_nnz)
V = Matrix{TD}(undef, total_nnz, num_ops)
return I, J, V, row_offsets
endKernel Execution
@kernel function weight_kernel(I, J, V, data, eval_points, adjl, boundary_data, ...)
batch_idx = @index(Global)
start_idx, end_idx = calculate_batch_range(batch_idx, batch_size, N_eval)
# Pre-allocate work arrays (reused within batch)
A = Symmetric(zeros(TD, n, n), :U)
b = _prepare_buffer(ℒrbf, TD, n)
for eval_idx in start_idx:end_idx
# Classify stencil type based on boundary conditions
stype = classify_stencil(
boundary_data.is_boundary, boundary_data.boundary_conditions,
eval_idx, neighbors, global_to_boundary
)
if stype isa DirichletStencil
# Identity row: only diagonal is 1.0
fill_dirichlet_entry!(I, J, V, eval_idx, start_pos, num_ops)
elseif stype isa InteriorStencil
# Standard interior stencil (no boundary points)
local_data = view(data, neighbors)
weights = _build_stencil!(A, b, ℒrbf, ℒmon, local_data, eval_point, basis, mon, k)
fill_entries!(I, J, V, weights, eval_idx, neighbors, start_pos, k, num_ops)
else # HermiteStencil
# Mixed interior/boundary stencil
update_hermite_stencil_data!(hermite_data, data, neighbors, ...)
weights = _build_stencil!(A, b, ℒrbf, ℒmon, hermite_data, eval_point, basis, mon, k)
fill_entries!(I, J, V, weights, eval_idx, neighbors, start_pos, k, num_ops)
end
end
endStencil classification via dispatch:
DirichletStencil: Identity row (only diagonal)
InteriorStencil: Standard stencil (all neighbors are interior)
HermiteStencil: Mixed interior/boundary stencil
Part 3: Stencil Mathematics (stencil_math.jl)
Stencil Assembly
function _build_stencil!(A, b, ℒrbf, ℒmon, data, eval_point, basis, mon, k)
_build_collocation_matrix!(A, data, basis, mon, k)
_build_rhs!(b, ℒrbf, ℒmon, data, eval_point, basis, k)
return (A \ b)[1:k, :]
endKey insight: Multiple dispatch on data type automatically selects:
data::AbstractVector→ Standard interior stencildata::HermiteStencilData→ Hermite boundary stencil
Collocation Matrix Building
Standard (Interior):
function _build_collocation_matrix!(A, data::AbstractVector, basis, mon, k)
AA = parent(A)
N = size(A, 2)
# RBF block (upper triangular)
for j in 1:k, i in 1:j
AA[i, j] = basis(data[i], data[j]) # Φ(xᵢ, xⱼ)
end
# Polynomial augmentation block
if basis.poly_deg > -1
for i in 1:k
a = view(AA, i, (k + 1):N)
mon(a, data[i]) # P(xᵢ)
end
end
endHermite (With Boundary Conditions):
function _build_collocation_matrix!(A, data::HermiteStencilData, basis, mon, k)
AA = parent(A)
N = size(A, 2)
# RBF block with Hermite modifications
for j in 1:k, i in 1:j
AA[i, j] = compute_hermite_rbf_entry(i, j, data, basis)
end
# Polynomial block with boundary modifications
if basis.poly_deg > -1
for i in 1:k
a = view(AA, i, (k + 1):N)
compute_hermite_poly_entry!(a, i, data, mon)
end
end
endHermite RBF Entry Dispatch
Uses 9 dispatch methods based on point types (Interior/Dirichlet/NeumannRobin):
function compute_hermite_rbf_entry(i, j, data, basis)
xi, xj = data.data[i], data.data[j]
type_i = point_type(data.is_boundary[i], data.boundary_conditions[i])
type_j = point_type(data.is_boundary[j], data.boundary_conditions[j])
return hermite_rbf_dispatch(type_i, type_j, i, j, xi, xj, data, basis)
endExample dispatches:
# Interior-Interior: Standard evaluation
hermite_rbf_dispatch(::InteriorPoint, ::InteriorPoint, ...) = basis(xi, xj)
# Interior-NeumannRobin: Apply boundary operator to second argument
hermite_rbf_dispatch(::InteriorPoint, ::NeumannRobinPoint, ...) =
α(bc_j) * φ + β(bc_j) * dot(nj, -∇φ)
# NeumannRobin-NeumannRobin: Apply to both arguments
hermite_rbf_dispatch(::NeumannRobinPoint, ::NeumannRobinPoint, ...) =
α(bc_i) * α(bc_j) * φ +
α(bc_i) * β(bc_j) * dot(nj, -∇φ) +
β(bc_i) * α(bc_j) * dot(ni, ∇φ) +
β(bc_i) * β(bc_j) * ∂i∂j_φRHS Vector Building
Similar dispatch pattern:
_build_rhs!(b::Vector, ℒrbf, ...)→ Single operator_build_rhs!(b::Matrix, ℒrbf::Tuple, ...)→ Multiple operators_build_rhs!(b, ..., data::AbstractVector, ...)→ Interior_build_rhs!(b, ..., data::HermiteStencilData, ...)→ Hermite
Part 4: Data Structures (types.jl)
Boundary Condition
struct BoundaryCondition{T<:Real}
α::T # Coefficient for value
β::T # Coefficient for normal derivative
end
# Boundary operator: Bu = α*u + β*∂ₙu
# Special cases:
# Dirichlet: α=1, β=0
# Neumann: α=0, β=1
# Robin: α≠0, β≠0Hermite Stencil Data
struct HermiteStencilData{T<:Real}
data::AbstractVector{Vector{T}} # k stencil points
is_boundary::Vector{Bool} # k flags
boundary_conditions::Vector{BoundaryCondition{T}} # k conditions
normals::Vector{Vector{T}} # k normal vectors
endStencil Classification
abstract type StencilType end
struct InteriorStencil <: StencilType end # All neighbors interior
struct DirichletStencil <: StencilType end # Eval point is Dirichlet
struct HermiteStencil <: StencilType end # Mixed interior/boundary
function classify_stencil(is_boundary, boundary_conditions, eval_idx, neighbors, ...)
if sum(is_boundary[neighbors]) == 0
return InteriorStencil()
elseif is_boundary[eval_idx] && is_dirichlet(boundary_conditions[...])
return DirichletStencil()
else
return HermiteStencil()
end
endCall Graph
User Code (e.g., Laplacian construction)
│
└─→ api.jl: _build_weights(ℒ, op)
│
├─→ Apply operator to basis: ℒrbf = ℒ(basis), ℒmon = ℒ(mon)
│
└─→ Route based on boundary conditions
│
└─→ kernel_exec.jl: build_weights_kernel(...)
│
├─→ allocate_sparse_arrays(...)
│ └─→ count_nonzeros → Exact allocation
│
├─→ launch_kernel!(...)
│ │
│ └─→ @kernel weight_kernel
│ │
│ └─→ FOR each eval_point in batch:
│ │
│ ├─→ types.jl: classify_stencil(...)
│ │ └─→ {Dirichlet, Interior, Hermite}
│ │
│ ├─→ IF DirichletStencil:
│ │ └─→ Fill identity row
│ │
│ ├─→ IF InteriorStencil:
│ │ └─→ stencil_math.jl: _build_stencil!
│ │ (standard interior)
│ │
│ └─→ IF HermiteStencil:
│ │
│ ├─→ types.jl: update_hermite_stencil_data!
│ │
│ └─→ stencil_math.jl: _build_stencil!
│ ├─→ _build_collocation_matrix!
│ │ └─→ compute_hermite_rbf_entry
│ │ └─→ hermite_rbf_dispatch
│ │ (9 point type combos)
│ │
│ ├─→ _build_rhs!
│ │ ├─→ hermite_rbf_rhs
│ │ └─→ hermite_mono_rhs!
│ │
│ └─→ solve(A, b)
│
└─→ sparse(I, J, V) → ReturnKey Concepts
1. Layer Separation
Why it matters:
stencil_math.jl contains pure functions → easily testable
kernel_exec.jl handles parallelism → can benchmark separately
api.jl routes requests → single place to trace flow
2. Stencil
A stencil is a local approximation of a differential operator at a point:
For a point x₀ with k neighbors {x₁, x₂, ..., xₖ}:
ℒu(x₀) ≈ Σᵢ₌₁ᵏ wᵢ * u(xᵢ)
where wᵢ are the weights we compute by solving A \ b.3. Collocation Matrix Structure
Standard (Interior):
┌─────────────────┬─────────┐
│ Φ(xᵢ, xⱼ) │ P(xᵢ) │ k×k RBF + k×nmon polynomial
├─────────────────┼─────────┤
│ P(xⱼ)ᵀ │ 0 │ nmon×k poly + nmon×nmon zero
└─────────────────┴─────────┘
Hermite (with Boundary):
Same structure, but entries modified by boundary operators:
- Interior-Interior: Φ(xᵢ, xⱼ)
- Interior-Boundary: BⱼΦ(xᵢ, xⱼ)
- Boundary-Boundary: BᵢBⱼΦ(xᵢ, xⱼ)
where Bᵢ = α*I + β*∂ₙᵢ is the boundary operator at point i4. Multiple Dispatch Benefits
The code uses Julia's multiple dispatch to automatically select:
Vector vs Matrix RHS (single vs tuple operators)
AbstractVector vs HermiteStencilData (interior vs boundary)
Point type combinations (9 Hermite dispatch variants)
This eliminates explicit conditionals and improves type stability.
Performance Characteristics
Memory Allocation
Exact allocation via
count_nonzeros(counts entries before allocating)Interior points: k entries per stencil
Dirichlet points: 1 entry per stencil (identity row)
Other boundary points: k entries per stencil
No over-allocation for interior-only problems
Parallelization
Batch processing prevents memory exhaustion
Work arrays reused within batch
KernelAbstractions.jl enables CPU/GPU execution
Type-stable buffers via operator arity traits
Stencil Classification
Runtime dispatch via
classify_stencilZero overhead for interior-only problems (all stencils classify as InteriorStencil)
Dirichlet points bypass expensive matrix assembly
Summary
The unified solve system achieves:
Clear organization: 3 layers with distinct responsibilities
Single code path: One allocation strategy, one kernel for all cases
Better testability: Pure math layer has no side effects
Easy navigation: "Where's X?" → Obvious file location
Zero overhead: Multiple dispatch is compile-time optimization
Exact allocation: No over-allocation for any problem type
Backward compatible: All exports maintained
File Mapping:
Want to modify math? →
stencil_math.jlNeed better parallelism? →
kernel_exec.jlAdding new operator entry point? →
api.jlNew boundary condition type? →
types.jl