Methodology - Farm Dirt

📊 Stability Analysis

▼

Coefficient of Variation (CV)

What it measures

How consistent sample values are at each location across years. CV expresses standard deviation as a percentage of the mean, making it comparable across nutrients with different scales.

Formula

CV = (Standard Deviation / Mean) × 100 // Where: Mean = sum(values) / count Variance = sum((value - mean)²) / count Standard Deviation = sqrt(Variance)

Thresholds

CV Range	Label	Interpretation
< 20%	Stable	Values are consistent year-to-year
20-30%	Moderate	Normal variability, trends may be meaningful
> 30%	Volatile	High variability, interpret trends with caution

Why it matters: High CV can indicate sampling inconsistency (different paths, depths, or moisture conditions), true field variability, or management history (variable rate applications). When CV is high, year-to-year changes may not reflect true nutrient trends.

Field Trends Stability Map Insight Messages

Standard Deviation for pH

What it measures

pH variability using standard deviation instead of CV because pH is a logarithmic scale (each unit represents a 10× change in hydrogen ion concentration).

Formula

SD = sqrt(sum((value - mean)²) / count)

Thresholds for pH

SD Range	Label	Interpretation
< 0.2	Stable	Consistent pH across samples
0.2-0.35	Moderate	Normal pH variability
> 0.35	Volatile	Significant pH variability

Why pH is different: A CV of 5% on pH 7.0 would be 0.35 units, but that same CV on pH 5.0 would only be 0.25 units. Using SD ensures consistent interpretation regardless of pH level.

Field Trends (pH card) Stability Map

Stability Score

What it measures

A 0-100 score inversely related to CV, used for color scaling on the stability map.

Formula

Stability Score = max(0, min(100, 100 - CV)) // Examples: // CV = 10% → Score = 90 (very stable) // CV = 30% → Score = 70 (moderate) // CV = 50% → Score = 50 (volatile)

Stability Map

Confidence Rating

What it measures

How much trust to place in trend analysis, based on both sample years and data stability.

Logic

Years	Rating
2 years	Low
3-4 years	Moderate
5+ years	High

Rating is downgraded one level if stability is "Volatile".

Field Trends Insight Messages

📈 Trend Analysis

▼

Linear Regression

What it measures

The rate of change (slope) in nutrient levels over time, using least-squares regression to find the best-fit line through yearly averages.

Formula

// Given data points: (year, average_value) n = number of years sumX = sum(years) sumY = sum(values) sumXY = sum(year × value) sumX² = sum(year²) slope = (n × sumXY - sumX × sumY) / (n × sumX² - sumX²) intercept = (sumY - slope × sumX) / n // Slope units: ppm/year (or units/year for pH)

Field Trends Scatter Plot

R² (Coefficient of Determination)

What it measures

How well the linear model fits the data. R² of 1.0 means perfect fit; R² of 0 means no linear relationship.

Formula

meanY = sum(values) / n SS_total = sum((value - meanY)²) SS_residual = sum((value - predicted)²) R² = 1 - (SS_residual / SS_total)

Interpreting R²: In agricultural data, R² above 0.5 indicates a strong linear trend. Lower values don't mean the data is wrong—they indicate factors beyond time are influencing the nutrient level.

Yield Correlation Scatter Plot

Trend Direction

What it measures

Whether a nutrient is increasing, decreasing, or holding steady over time.

Logic

Why relative thresholds: A slope of 1 ppm/year is significant for Zn (often 1-3 ppm) but trivial for P (often 20-50 ppm). Relative thresholds ensure trends are meaningful at any scale.

Field Trends Insight Messages

Years to Critical

What it measures

If a nutrient is declining, estimates when it will reach the critical threshold at the current rate.

Formula

// Only calculated when: // - Trend is declining (slope < 0) // - Current value is above critical // - Decline rate is meaningful years_to_critical = (currentValue - criticalLevel) / |slope| // Example: // Current P = 30 ppm, Critical = 15 ppm, Slope = -2 ppm/yr // Years = (30 - 15) / 2 = 7.5 years

Field Trends Warning Messages

Percent Change

What it measures

The relative change between the first and last year in the trend period.

Formula

percentChange = ((lastValue - firstValue) / firstValue) × 100 // Example: // First year avg: 25 ppm, Last year avg: 30 ppm // Change = ((30 - 25) / 25) × 100 = +20%

Field Trends Comparison View

⚖️ Nutrient Ratios

▼

P:Zn Ratio

What it measures

The balance between phosphorus and zinc. High ratios indicate potential zinc uptake issues even when soil Zn is adequate.

Formula

P:Zn Ratio = P (ppm) / Zn (ppm) // Example: // P = 30 ppm, Zn = 2.0 ppm // Ratio = 30 / 2.0 = 15:1

Thresholds

Ratio	Status	Interpretation
< 8:1	Low	P may be limiting relative to Zn
8-12:1	Optimal	Balanced P and Zn relationship
12-20:1	Elevated	Monitor Zn, especially with sensitive crops
> 20:1	High Risk	P-induced Zn deficiency likely

Why it matters: High phosphorus can interfere with zinc uptake at the root surface, even when soil Zn tests adequate. This is most common in high-P soils, sandy textures, and high-yield systems where Zn demand is elevated.

P:Zn Ratio Card Zn Insights

Dynamic Zn Target

What it measures

The recommended Zn level based on current P, ensuring a balanced ratio even when P is high.

Formula

standardMin = 1.5 ppm ratioBasedTarget = P / 10 // Target 10:1 ratio znTarget = max(standardMin, ratioBasedTarget) // Examples: // P = 15 ppm → target = max(1.5, 1.5) = 1.5 ppm // P = 25 ppm → target = max(1.5, 2.5) = 2.5 ppm // P = 50 ppm → target = max(1.5, 5.0) = 5.0 ppm

Sufficiency Thresholds (using dynamic target)

Zn Level	Status
< 50% of target	Low (deficient)
50-100% of target	Marginal (below target)
≥ 100% of target	Adequate

Why dynamic: With P at 50 ppm, a Zn level of 2.0 ppm gives a ratio of 25:1—high risk for Zn deficiency. The dynamic target of 5.0 ppm ensures balanced uptake regardless of P level.

Zn Field Trends Card Zn Insights

🎯 Threshold Logic

▼

Critical vs Optimal vs Ideal

Definitions

Level	Definition	Example (P)
Critical	Below this, yield loss is likely	15 ppm
Optimal Min	Lower bound of ideal range	25 ppm
Optimal Max	Upper bound of ideal range	50 ppm
Ideal	Midpoint of optimal range	37.5 ppm

Formula for Ideal

ideal = (optimalMin + optimalMax) / 2

Settings Color Scales Insight Messages

Nutrient Behavior Types

What it determines

How trends are interpreted depends on whether more is better, less is better, or a specific target is best.

Behavior Categories

Type	Nutrients	Trend Interpretation
More is OK	P, K, OM, S, Zn, Cu, Mn, Fe, B, K_Sat	Increasing = good, decreasing = concern
Target Specific	pH, Ca_sat, Mg_sat	Moving toward target = good, away = concern
Lower is Better	H_Sat	Decreasing = good, increasing = concern

Why this matters: A declining pH moving from 7.5 to 7.0 is different than declining from 6.5 to 6.0. The first is improving toward optimal; the second is concerning. Behavior type ensures correct interpretation.

Trend Insights Badge Logic

Badge (Urgency) Assignment

Logic

// Based on current level relative to thresholds + trend direction if below_critical: badge = "Action Required" else if below_optimal AND declining: badge = "Needs Attention" else if below_optimal OR declining_from_optimal: badge = "Review" else: badge = "Good" // Override: If variability is high (volatile), cannot be "Good" if stability == "Volatile" AND badge == "Good": badge = "Review"

Badge Definitions

Badge	Meaning
Good	At or above target, stable or improving
Review	Minor concern, worth monitoring
Needs Attention	Below optimal and/or declining
Action Required	Below critical threshold

Field Trends Cards

📐 Field Averages & Aggregation

▼

Mean (Average)

Formula

mean = sum(values) / count // Zeros are excluded for certain attributes where 0 = "not tested" // (Zn, Cu, Mn, Fe, B, S, Ca_sat, Mg_sat, K_sat, H_Sat)

Field Averages Map Markers Trend Charts

Median

What it measures

The middle value when samples are sorted. Less affected by outliers than mean.

Formula

sorted = values.sort() n = count if n is odd: median = sorted[floor(n/2)] else: median = (sorted[n/2 - 1] + sorted[n/2]) / 2

When to use median: If one sample in a field shows P = 200 ppm (possible manure spot), the mean is skewed. Median gives a more representative "typical" value for the field.

Field Trends Color Scaling

Location Grouping

How it works

Samples are grouped by proximity to track the same location across years. Uses a grid-based approach for efficiency.

Algorithm

proximityFeet = 50 // Default: samples within 50 ft are same location CELL_SIZE = proximityFeet / 364000 // Convert to degrees (~30m cells) // Assign each sample to a grid cell gridLat = floor(latitude / CELL_SIZE) gridLon = floor(longitude / CELL_SIZE) cellKey = `${gridLat},${gridLon}` // Samples in same cell are considered same location

Stability Map CV Calculation

🌾 Yield Analysis

▼

Pearson Correlation Coefficient (r)

What it measures

The strength and direction of the linear relationship between a soil nutrient and yield.

Formula

n = count of paired values sumX = sum(nutrient values) sumY = sum(yield values) sumXY = sum(nutrient × yield) sumX² = sum(nutrient²) sumY² = sum(yield²) r = (n × sumXY - sumX × sumY) / sqrt((n × sumX² - sumX²) × (n × sumY² - sumY²)) // r ranges from -1 to +1 // +1 = perfect positive correlation // -1 = perfect negative correlation // 0 = no linear relationship

Significance Levels

\|r\| Value	Significance
> 0.7	High (strong relationship)
0.4-0.7	Medium (moderate relationship)
0.2-0.4	Low (weak relationship)
< 0.2	None (no meaningful relationship)

Correlation ≠ Causation: A high correlation between P and yield doesn't prove P is limiting yield. The correlation could be driven by other factors (soil type, drainage, management) that happen to correlate with P.

Yield Correlation Table Scatter Plot

Normalized (Field-Relative) Correlation

What it measures

Correlation after removing between-field differences, showing only within-field relationships.

Method

// For each sample: fieldMean = mean(all samples in that field) normalizedValue = (value / fieldMean) × 100 // As % of field mean // Then calculate correlation on normalized values

Why normalize: Raw correlations include field-to-field differences. A high-yielding field might have high P simply because it's a better field overall. Normalization isolates whether P variation within a field affects yield, which is more actionable for variable-rate decisions.

Yield Correlation (normalized mode)

95% Confidence Interval

What it shows

The range where we expect 95% of individual predictions to fall, shown as a band around the regression line.

Formula

tValue = 1.96 // For 95% confidence MSE = sum(residuals²) / (n - 2) // Mean squared error SE = sqrt(MSE) // Standard error // For each point x: SE_y = SE × sqrt(1 + 1/n + (x - meanX)² / SS_x) upper = predicted + tValue × SE_y lower = predicted - tValue × SE_y

Scatter Plot

Yield by Nutrient Level (Bucket Analysis)

What it measures

Average yield at different nutrient levels (Low, Medium, High) based on agronomic thresholds.

Method

// Classify each sample into buckets based on thresholds: Low = value < critical threshold Medium = critical ≤ value < optimal max High = value ≥ optimal max // Calculate average yield in each bucket: avgYield_low = mean(yield of all Low samples) avgYield_medium = mean(yield of all Medium samples) avgYield_high = mean(yield of all High samples) // Calculate yield difference: yieldDiff = avgYield_high - avgYield_low

Interpretation

Pattern	What it suggests
Low < Medium < High	Classic response - nutrient is limiting in low areas
Low ≈ Medium ≈ High	No yield response to this nutrient (not limiting)
High < Low	Possible toxicity, imbalance, or confounding factor

Sample size matters: Buckets with fewer than 10 samples may not be reliable. Look for consistent patterns across multiple years before making management changes.

Yield by Nutrient Level Tab

Breakpoint Analysis

What it measures

The critical nutrient threshold where yield response changes - below = yield penalty, above = diminishing returns.

Algorithm (Binning with Bootstrap)

// Step 1: Test each unique nutrient value as potential breakpoint for each candidate threshold t: below = samples where nutrient < t above = samples where nutrient ≥ t // Require minimum samples per side (15% of total, min 5) if below.count < minPerSide OR above.count < minPerSide: skip penalty = mean(yield_above) - mean(yield_below) if penalty > bestPenalty AND penalty ≥ MIN_PENALTY: bestBreakpoint = t // Step 2: Bootstrap stability test (50 iterations) for i = 1 to 50: subset = random 80% of samples bootBreakpoint = run algorithm on subset if |bootBreakpoint - bestBreakpoint| ≤ tolerance: nearCount++ stabilityPct = nearCount / 50 × 100

Minimum Penalty Thresholds

Crop	Min Penalty	Why
Corn	5 bu/ac	Smaller differences not economically significant
Soybeans	2 bu/ac	Lower yield baseline, smaller absolute differences

Confidence Levels

Stability %	Confidence	Meaning
≥ 70%	High	Breakpoint found consistently across resamples
50-69%	Medium	Breakpoint is likely but variable
30-49%	Medium-Low	Weak signal, needs more data
< 30%	Low	No reliable breakpoint detected

Data-driven thresholds: Unlike fixed textbook thresholds, breakpoint analysis finds YOUR threshold based on YOUR data. This accounts for soil type, climate, hybrid/variety, and management that make your operation unique.

Breakpoint Analysis Tab

Multivariate Regression (MVR)

What it measures

The combined effect of multiple soil nutrients on yield, accounting for the influence of each variable while controlling for others.

Algorithm (Ordinary Least Squares)

// Multiple linear regression model: Yield = β₀ + β₁×P + β₂×K + β₃×OM + β₄×pH + ... + ε // Solve using matrix algebra: β = (X'X)⁻¹ × X'Y // Where: X = matrix of nutrient values (with intercept column of 1s) Y = vector of yield values β = vector of coefficients (slopes for each nutrient)

Key Statistics

Statistic	What it tells you
R² (R-squared)	% of yield variation explained by the model (higher = better fit)
Adjusted R²	R² adjusted for number of variables (penalizes overfitting)
Coefficient (β)	Expected yield change per 1-unit increase in nutrient
p-value	Probability coefficient is zero (< 0.05 = statistically significant)
VIF	Variance Inflation Factor - detects collinearity (> 5 = concern)

Collinearity Check (VIF)

// For each variable Xj, regress it against all other variables: Xj = α₀ + α₁×X₁ + ... + αₖ×Xₖ // Calculate R² of this auxiliary regression: VIF_j = 1 / (1 - R²_j) // VIF > 5: moderate collinearity // VIF > 10: severe collinearity - consider removing variable

Why multivariate? Single-nutrient correlations can be misleading. For example, P and K might both correlate with yield simply because high-fertility fields have both. MVR isolates each nutrient's unique contribution, controlling for the others.

Multivariate Regression Tab

Hinge-MVR (Segmented Regression)

What it measures

A two-segment linear model that captures different yield responses below vs. above a breakpoint. Also called "piecewise regression" or "bent-stick model."

Algorithm

// Create hinge features from the breakpoint (t): lowPart = max(0, t - x) // Distance below breakpoint highPart = max(0, x - t) // Distance above breakpoint // Regression model: Yield = β₀ + β₁×lowPart + β₂×highPart + β₃×cov₁ + ... + ε // Interpretation: β₁ = yield change per unit BELOW breakpoint (deficiency response) β₂ = yield change per unit ABOVE breakpoint (luxury response)

Visual Interpretation

╱ β₂ slope (above) ╱ ∙ ← Breakpoint (t) ╱ ╱ β₁ slope (below) ───────────────┴────────────────── Low Nutrient High

What the coefficients mean

Scenario	β₁ (below)	β₂ (above)	Interpretation
Classic deficiency	Large positive	Small/zero	Strong response below threshold, plateau above
Linear response	≈ equal	≈ equal	No breakpoint needed, use simple regression
Toxicity	Small	Negative	Yield decreases at high levels

When to use: Hinge-MVR is most useful when breakpoint analysis finds a stable threshold. It quantifies HOW MUCH yield responds on each side, while controlling for other nutrients.

Breakpoint Analysis Tab (toggle)

🗺️ Map & Spatial

▼

Point-in-Polygon (Field Detection)

What it does

Determines if a soil sample or yield point falls within a field boundary.

Algorithm (Ray Casting)

// Cast a ray from point to infinity (eastward) // Count how many boundary edges the ray crosses // Odd count = inside, Even count = outside inside = false for each edge (yi, xi) to (yj, xj): if ((yi > lat) != (yj > lat)) AND (lon < (xj - xi) × (lat - yi) / (yj - yi) + xi): inside = !inside return inside

Import (yield matching) Sample-to-field assignment

Haversine Distance

What it calculates

The great-circle distance between two geographic points, accounting for Earth's curvature.

Formula

R = 3959 // Earth radius in miles dLat = (lat2 - lat1) × π / 180 dLon = (lon2 - lon1) × π / 180 a = sin(dLat/2)² + cos(lat1 × π/180) × cos(lat2 × π/180) × sin(dLon/2)² distance = R × 2 × atan2(sqrt(a), sqrt(1-a)) // Result in miles

Sample grouping Yield matching

Dynamic Zoom-Based Color Scaling

What it does

Adjusts the color scale based on currently visible fields, not all fields. This reveals within-view variation.

Logic

// Get field averages for fields currently visible on map visibleAvgs = getAveragesForVisibleFields() // Calculate min/max from visible fields only minAvg = min(visibleAvgs) maxAvg = max(visibleAvgs) range = maxAvg - minAvg // Scale colors relative to visible range position = (fieldAvg - minAvg) / range // 0 to 1 color = getGradientColor(position)

Why dynamic: If one field has P=100 and all others have P=20-30, a static scale would make all the normal fields look identical (all red). Dynamic scaling reveals the meaningful variation within the current view.

Main Map

🎨 Color Scales

▼

Nutrient Status Gradient

Color stops

Position	Color	Meaning
0%	Red (#dc2626)	Critical/Deficient
25%	Orange (#f97316)	Below optimal
50%	Yellow (#eab308)	Marginal
75%	Lime (#84cc16)	Good
100%	Green (#16a34a)	Optimal

Color Interpolation

// Linear interpolation between adjacent color stops factor = (position - stop1.pos) / (stop2.pos - stop1.pos) R = round(R1 + (R2 - R1) × factor) G = round(G1 + (G2 - G1) × factor) B = round(B1 + (B2 - B1) × factor)

Map Markers Sample Points

IQR-Based Color Scaling

What it does

For attributes without agronomic thresholds (CEC, micronutrients), uses the interquartile range to handle outliers.

Algorithm

sorted = values.sort() Q1 = sorted[floor(n × 0.25)] // 25th percentile Q3 = sorted[floor(n × 0.75)] // 75th percentile IQR = Q3 - Q1 lowerBound = max(min, Q1 - 1.5 × IQR) upperBound = min(max, Q3 + 1.5 × IQR) // Clamp value to bounds, then scale clampedValue = clamp(value, lowerBound, upperBound) position = (clampedValue - lowerBound) / (upperBound - lowerBound)

Why IQR: For CEC, values can range from 5 to 40+ meq/100g. Without agronomic "optimal", we use the distribution itself. IQR-based bounds prevent one extreme value from compressing the entire color scale.

CEC Map Micronutrient Maps

Change Gradient (Year-to-Year)

Color stops for percent change

Change	Color
-30% or worse	Dark red (#b91c1c)
-15%	Light red (#f87171)
0% (neutral)	Gray (#d1d5db)
+15%	Light green (#86efac)
+30% or better	Dark green (#15803d)

Comparison View Change Maps