Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lamassuiot/lamassuiot/llms.txt

Use this file to discover all available pages before exploring further.

This guide provides solutions to common operational issues encountered when running Lamassu IoT in production environments.

Database Issues

Connection Pool Exhausted

Symptoms:
  • HTTP 500 errors from services
  • Logs showing “connection pool exhausted” or “too many clients”
  • Slow API response times
Diagnosis:
# Check PostgreSQL connection count
psql -h localhost -U postgres -c \
  "SELECT count(*) FROM pg_stat_activity;"

# Check max connections
psql -h localhost -U postgres -c \
  "SHOW max_connections;"

# Identify connections by application
psql -h localhost -U postgres -c \
  "SELECT application_name, count(*) FROM pg_stat_activity 
   GROUP BY application_name;"
Solutions:
  1. Increase PostgreSQL max_connections:
    # postgresql.conf
    max_connections = 200  # Default is often 100
    
    # Restart PostgreSQL
    systemctl restart postgresql
    
  2. Configure connection pooling in services:
    postgres:
      max_open_connections: 25
      max_idle_connections: 5
      connection_max_lifetime_minutes: 10
    
  3. Use PgBouncer for connection pooling:
    # /etc/pgbouncer/pgbouncer.ini
    [databases]
    lamassu = host=localhost port=5432 dbname=lamassu
    
    [pgbouncer]
    pool_mode = transaction
    max_client_conn = 1000
    default_pool_size = 25
    

Slow Queries

Symptoms:
  • High API latency
  • Database CPU usage at 100%
  • Long-running queries in pg_stat_activity
Diagnosis:
-- Find slow running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Check for missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
  AND n_distinct > 100
ORDER BY abs(correlation) ASC;

-- Analyze query plan
EXPLAIN ANALYZE
SELECT * FROM certificates WHERE status = 'ACTIVE' AND expiration < NOW();
Solutions:
  1. Add missing indexes:
    -- Index on frequently queried columns
    CREATE INDEX idx_certificates_status ON certificates(status);
    CREATE INDEX idx_certificates_expiration ON certificates(expiration);
    CREATE INDEX idx_devices_dms_id ON devices(dms_id);
    
  2. Update table statistics:
    ANALYZE certificates;
    ANALYZE devices;
    ANALYZE cas;
    
  3. Optimize configuration for workload:
    # postgresql.conf
    shared_buffers = 4GB              # 25% of RAM
    effective_cache_size = 12GB       # 75% of RAM
    work_mem = 64MB                   # Per-operation memory
    maintenance_work_mem = 1GB        # For VACUUM, indexes
    random_page_cost = 1.1            # For SSD storage
    

Database Migration Failures

Symptoms:
  • Service fails to start
  • Logs showing “migration failed” or “schema version mismatch”
Diagnosis:
-- Check current schema version
SELECT * FROM goose_db_version ORDER BY version_id DESC LIMIT 5;

-- Check for failed migrations
SELECT * FROM goose_db_version WHERE is_applied = false;
Solutions:
  1. Manually run migration:
    # Using goose-lamassu tool
    goose-lamassu -dir ./engines/storage/postgres/migrations/ca \
      postgres "host=localhost user=postgres dbname=ca sslmode=disable" up
    
  2. Fix failed migration and retry:
    -- Mark migration as not applied to retry
    DELETE FROM goose_db_version WHERE version_id = 20250309120000;
    
  3. Restore from backup if corruption occurred:
    pg_restore -d lamassu /backup/lamassu_latest.dump
    
Always backup your database before attempting manual migration fixes.

Certificate Issuance Issues

CA Not Found

Symptoms:
  • HTTP 404 when signing certificates
  • Error: “CA with id ‘xxx’ not found”
Diagnosis:
# List all CAs
curl -H "Authorization: Bearer $TOKEN" \
  https://lamassu.example.com/api/ca/v1/cas | jq

# Get specific CA
curl -H "Authorization: Bearer $TOKEN" \
  https://lamassu.example.com/api/ca/v1/cas/{ca-id} | jq
Solutions:
  1. Verify CA exists in database:
    SELECT id, subject_common_name, status FROM cas WHERE id = 'your-ca-id';
    
  2. Check CA status:
    # Ensure CA is in ACTIVE status, not EXPIRED or REVOKED
    curl -H "Authorization: Bearer $TOKEN" \
      https://lamassu.example.com/api/ca/v1/cas/{ca-id} | jq '.status'
    
  3. Recreate CA if missing:
    curl -X POST https://lamassu.example.com/api/ca/v1/cas \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "id": "replacement-ca",
        "type": "MANAGED",
        "subject": {
          "common_name": "Replacement CA",
          "organization": "YourOrg"
        },
        "engine_id": "vault-engine",
        "key_metadata": {"type": "RSA", "bits": 4096}
      }'
    

Crypto Engine Failures

Symptoms:
  • Certificate signing fails with crypto errors
  • Timeouts during CA operations
  • Errors mentioning PKCS#11, Vault, or AWS KMS
PKCS#11 HSM Issues:
# Test PKCS#11 module
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so --list-slots

# Check HSM connectivity
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
  --slot 0 --login --pin 1234 --list-objects

# Verify token PIN
pkcs11-tool --module /usr/lib/softhsm/libsofthsm2.so \
  --slot 0 --login --pin 1234 --test
Common PKCS#11 fixes:
  • Incorrect PIN: Update crypto engine configuration
  • Token not initialized: Initialize token with pkcs11-tool
  • HSM disconnected: Check network/USB connection
  • Session limit reached: Restart HSM or service
HashiCorp Vault Issues:
# Check Vault status
vault status

# Test authentication
vault login -method=approle role_id=$ROLE_ID secret_id=$SECRET_ID

# List secrets
vault kv list lamassu-pki/

# Check Vault logs
journalctl -u vault -n 100
Common Vault fixes:
  1. Vault sealed:
    vault operator unseal
    # Or enable auto-unseal with cloud KMS
    
  2. Token expired:
    # Generate new AppRole credentials
    vault write -f auth/approle/role/lamassu/secret-id
    # Update service configuration
    
  3. Permission denied:
    # Verify policy allows CA operations
    vault policy read lamassu-ca
    
    # Update policy if needed
    vault policy write lamassu-ca - <<EOF
    path "lamassu-pki/*" {
      capabilities = ["create", "read", "update", "delete", "list"]
    }
    EOF
    
AWS KMS Issues:
# Test KMS access
aws kms describe-key --key-id alias/lamassu-ca

# Test encryption/decryption
echo "test" | base64 > /tmp/plaintext.txt
aws kms encrypt \
  --key-id alias/lamassu-ca \
  --plaintext fileb:///tmp/plaintext.txt \
  --query CiphertextBlob \
  --output text | base64 -d > /tmp/encrypted.bin

# Check IAM permissions
aws iam get-user
aws iam list-attached-user-policies --user-name lamassu-service
Common AWS KMS fixes:
  1. Insufficient permissions:
    {
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Action": [
          "kms:Decrypt",
          "kms:Encrypt",
          "kms:GenerateDataKey",
          "kms:DescribeKey",
          "kms:CreateAlias",
          "kms:Sign",
          "kms:Verify"
        ],
        "Resource": "arn:aws:kms:us-east-1:123456789:key/*"
      }]
    }
    
  2. Region mismatch:
    # Ensure crypto engine config matches KMS key region
    crypto_engines:
      aws_kms:
        - id: "aws-kms"
          region: "us-east-1"  # Must match key region
    

EST Enrollment Issues

400 Bad Request

Symptoms:
  • EST enrollment fails with HTTP 400
  • Error: “Invalid request body” or “Malformed CSR”
Diagnosis:
# Verify CSR format
base64 -d device.b64 | openssl req -inform DER -text -noout

# Check for newlines in base64 (common issue)
cat device.b64 | wc -l
# Should output: 1 (single line)
Solutions:
  1. Ensure base64 has no newlines:
    # Correct: single-line base64
    openssl req -in device.csr -outform DER | base64 -w 0 > device.b64
    
    # Wrong: multi-line base64
    openssl req -in device.csr -outform DER | base64 > device.b64
    
  2. Verify Content-Type header:
    curl -v -H "Content-Type: application/pkcs10" \
      --data-binary "@device.b64" \
      "https://est.example.com/.well-known/est/dms-01/simpleenroll"
    
  3. Validate CSR before sending:
    # Check CSR is valid DER format
    base64 -d device.b64 > device.der
    openssl req -inform DER -in device.der -text -noout
    

401 Unauthorized

Symptoms:
  • EST enrollment rejected
  • Error: “Client certificate not trusted” or “Authentication failed”
Diagnosis:
# Test TLS handshake
openssl s_client -connect est.example.com:443 \
  -cert bootstrap.crt -key bootstrap.key -showcerts

# Verify client certificate chain
openssl verify -CAfile ca-bundle.pem bootstrap.crt

# Check certificate issuer
openssl x509 -in bootstrap.crt -noout -issuer
Solutions:
  1. Verify DMS validation CA list:
    curl -H "Authorization: Bearer $TOKEN" \
      https://lamassu.example.com/api/dmsmanager/v1/dms/{dms-id} | \
      jq '.settings.enrollment_settings.est_rfc7030_settings.authentication.client_certificate.validation_cas'
    
  2. Add bootstrap CA to validation list:
    curl -X PATCH https://lamassu.example.com/api/dmsmanager/v1/dms/{dms-id} \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json-patch+json" \
      -d '[{
        "op": "add",
        "path": "/settings/enrollment_settings/est_rfc7030_settings/authentication/client_certificate/validation_cas/-",
        "value": "bootstrap-ca-id"
      }]'
    
  3. Check certificate expiration:
    openssl x509 -in bootstrap.crt -noout -dates
    

404 Not Found

Symptoms:
  • EST endpoint returns 404
  • Error: “DMS not found”
Diagnosis:
# Verify DMS exists
curl -H "Authorization: Bearer $TOKEN" \
  https://lamassu.example.com/api/dmsmanager/v1/dms | jq '.dms[].id'

# Check DMS status
curl -H "Authorization: Bearer $TOKEN" \
  https://lamassu.example.com/api/dmsmanager/v1/dms/{dms-id}
Solutions:
  1. Verify correct DMS ID in URL:
    # Correct format
    https://est.example.com/.well-known/est/{dms-id}/simpleenroll
    
    # Check available DMS instances
    curl -H "Authorization: Bearer $TOKEN" \
      https://lamassu.example.com/api/dmsmanager/v1/dms
    
  2. Create missing DMS:
    curl -X POST https://lamassu.example.com/api/dmsmanager/v1/dms \
      -H "Authorization: Bearer $TOKEN" \
      -d '{
        "id": "production-dms",
        "name": "Production DMS",
        "settings": {
          "enrollment_settings": {
            "protocol": "EST",
            "device_provisioning_profile_id": "iot-profile"
          }
        }
      }'
    

Service Startup Failures

Service Won’t Start

Symptoms:
  • Systemd service fails to start
  • Service crashes immediately after launch
Diagnosis:
# Check service status
systemctl status lamassu-ca

# View recent logs
journalctl -u lamassu-ca -n 100 --no-pager

# Check for port conflicts
sudo netstat -tlnp | grep :8080

# Verify configuration file syntax
cat /etc/lamassu/ca-config.yaml | yq eval
Common issues:
  1. Port already in use:
    # Find process using port
    sudo lsof -i :8080
    
    # Change port in configuration
    # /etc/lamassu/ca-config.yaml
    http:
      port: 8081
    
  2. Database connection failure:
    # Test database connectivity
    psql -h localhost -U postgres -d lamassu -c "SELECT 1;"
    
    # Check database credentials in config
    cat /etc/lamassu/ca-config.yaml | grep -A 5 postgres
    
  3. Missing environment variables:
    # Check service environment
    systemctl show lamassu-ca | grep Environment
    
    # Set required variables in systemd unit
    # /etc/systemd/system/lamassu-ca.service
    [Service]
    Environment="VAULT_TOKEN=s.xxxxx"
    Environment="DB_PASSWORD=secret"
    
  4. File permissions:
    # Check config file ownership
    ls -l /etc/lamassu/ca-config.yaml
    
    # Fix permissions
    sudo chown lamassu:lamassu /etc/lamassu/ca-config.yaml
    sudo chmod 640 /etc/lamassu/ca-config.yaml
    

Memory Issues

Symptoms:
  • Service OOM (out of memory) killed
  • Logs showing “cannot allocate memory”
Diagnosis:
# Check memory usage
free -h

# Monitor process memory
top -p $(pgrep lamassu-ca)

# Check OOM killer logs
dmesg | grep -i oom
journalctl -k | grep -i oom
Solutions:
  1. Increase container/VM memory:
    # Kubernetes
    resources:
      limits:
        memory: 2Gi
      requests:
        memory: 1Gi
    
  2. Tune Go garbage collector:
    # Increase GC target percentage (default 100)
    export GOGC=200
    
    # Set memory limit
    export GOMEMLIMIT=1800MiB  # Leave headroom
    
  3. Reduce database connection pool:
    postgres:
      max_open_connections: 10  # Reduce from default 25
    

Monitoring and Observability Issues

Metrics Not Appearing

Symptoms:
  • Grafana shows no data
  • OTLP exporter errors in logs
Diagnosis:
# Test OTLP collector connectivity
curl http://otel-collector:4318/v1/metrics

# Check service OTEL configuration
cat /etc/lamassu/ca-config.yaml | grep -A 10 otel

# Verify collector is receiving data
curl http://otel-collector:8888/metrics | grep lamassu
Solutions:
  1. Enable OTEL in service config:
    otel:
      metrics:
        enabled: true
        hostname: "otel-collector"
        port: 4318
        scheme: "http"
    
  2. Check OTLP collector configuration:
    # otel-collector-config.yaml
    receivers:
      otlp:
        protocols:
          http:
            endpoint: 0.0.0.0:4318
    
    exporters:
      prometheus:
        endpoint: 0.0.0.0:9090
    
    service:
      pipelines:
        metrics:
          receivers: [otlp]
          exporters: [prometheus]
    
  3. Verify network connectivity:
    # From Lamassu service container
    telnet otel-collector 4318
    
    # Check DNS resolution
    nslookup otel-collector
    

Traces Missing Context

Symptoms:
  • Distributed traces show disconnected spans
  • No parent-child relationships in traces
Solutions:
  1. Enable trace propagation:
    otel:
      traces:
        enabled: true
        hostname: "otel-collector"
        port: 4318
    
  2. Verify HTTP instrumentation:
    // Services use otelhttp for automatic propagation
    import "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    
  3. Check propagation headers:
    curl -v https://lamassu.example.com/api/ca/v1/cas \
      -H "traceparent: 00-<trace-id>-<span-id>-01"
    

Performance Issues

Slow API Responses

Diagnosis checklist:
1

Check HTTP Metrics

histogram_quantile(0.95, 
  rate(http_server_duration_bucket[5m])
) by (http_route)
2

Analyze Distributed Traces

Find slow spans in Tempo/Jaeger to identify bottleneck (DB, crypto, network)
3

Check Database Performance

SELECT query, calls, mean_exec_time, max_exec_time 
FROM pg_stat_statements 
ORDER BY mean_exec_time DESC LIMIT 10;
4

Monitor Crypto Engine Latency

histogram_quantile(0.95,
  rate(crypto_operation_duration_seconds_bucket[5m])
) by (engine_id)
Common fixes:
  • Add database indexes for frequently queried fields
  • Increase database shared_buffers and work_mem
  • Scale HSM/Vault infrastructure if crypto operations are slow
  • Add caching layer for frequently accessed CAs
  • Horizontal scaling of Lamassu services

Getting Help

If you’re unable to resolve an issue:

Check Logs

Review service logs with journalctl or your log aggregation system. Set log level to debug temporarily.

GitHub Issues

Search existing issues or open a new one: github.com/lamassuiot/lamassuiot/issues

Community Discussions

Ask questions in GitHub Discussions: github.com/lamassuiot/lamassuiot/discussions

Documentation

Consult the official documentation: www.lamassu.io/docs
When reporting issues, include:
  • Lamassu version (git describe --tags)
  • Deployment method (Docker, Kubernetes, monolithic)
  • Relevant configuration (redact secrets)
  • Complete error messages and stack traces
  • Steps to reproduce the issue
  • Logs from affected services