Tech Specs - Carrier Cache

Highlights

Scale Up Up Up

Scale to hundreds or thousands of cores per server with user-defined thread topologies.

Lowest Overhead Ever

Be amazed at how little space your existing data requires. Serve the same data while saving dozens to hundreds of gigabytes of RAM per server.

Security For All

Built-in TLS combine with security minded configuration guards. Encryption isn't optional anymore and neither are configuration files without automatic sanity checks.

Network Flexibility

Serve from multiple IPs and ports, provision independent TLS configs, allow networks to be "stats only" so monitoring systems can't read data, and even networks to be read-only while others retain full access.

Features

Carrier Grade In-Memory Caching

Built for Production at Scale

Carrier Cache is the most space efficient in-memory caching server.

All Servers Big or Small

Scale from a few cores to thousands of cores easily with Carrier Cache's independent async thread architecture. (This improves your operational efficiency and can save you a lot of server hand-holding time every month.)

Smallest Memory Usage Possible

Carrier Cache stores your data as small as possible without impacting speed. Carrier Cache's custom CPU-cache aware data structures are designed for minimum overhead and maximum efficiency. (This can save you tons per year in server costs.)

Security Built-In

Carrier Cache includes full TLS capability with support for fast elliptic curves to minimize encryption latency. (This helps with required PCI and GDPR compliance and also helps you be more professionally responsible with customer data—if your current cache infrastructure is not using TLS, you are not PCI or GDPR compliant.)

memcached protocol

Distribute your data across servers using any memcached consistent hashing client for widely compatible scalability. (This lets you use existing clients without refactoring your systems.)

Save the Planet (and your budget)

Carrier Cache is so efficient you can reduce your server count because Carrier Cache's memory layouts store the same data in less space than your existing caches. (This lets you be environmentally and fiscally responsible at the same time.)

Support

Carrier Cache Server includes a 3-day email support turnaround time. Next business day turnaround and realtime support are available as add-ons. (This helps fit the needs of your company depending on your required support policies and growth stages.)

Subscribe to receive announcements, release notes, feature updates, progress reports, and maybe even some surprises along the way.

Internals

Pre-Cached Everything

Carrier Cache can handle most requests without needing to allocate any memory at all. Carrier Cache uses pre-cached internal objects to avoid in-flight memory allocation jitter.

You can grow or shrink the size of pre-cached object pools based on your platform requirements with a simple config option.

Object Pool Pre-Allocation Config:

runtime {
    precacheMultiplier [0+] (default: 1)
}

Multiply the default cache pool size by 10 => store 100 internal objects per thread ready to use without needing allocation during requests:

runtime {
    precacheMultiplier 10;
}

Or, no pre-allocation (multiply default pool size by 0) so allocation will happen during during requests (for more memory limited environments):

runtime {
    precacheMultiplier 0;
}

Lock-Free Everything

Carrier Cache is a massively parallel threaded architecture and every internal data structure is lock free for the highest performance possible.

Carrier Cache threads use pre-allocated message queues to communicate. You can use a simple config option to set the size of message queues at startup based on your platform and memory requirements.

Internal message queues are allocated in powers of two. The msgQMultiplier config option specifies how many times to grow the default message queue size of 4096 entries.

Inter-Thread Message Queue Config:

runtime {
    msgQMultiplier [0+] (default: 0)
}

The formula for message queue size is: 4096 * 2^x where x is msgQMultiplier (which is also 2^(12 + x)). This example provisions 32,768 message queue slots (because 4096 * 2^3 = 2^(12 + 3) = 2^15 = 32,768).

runtime {
    msgQMultiplier 3
}

3-Stage Asynchronous Multi-Threaded Processing Architecture

Carrier Cache maintains three indepedent thread pools: one for processing incoming data, one for running client commands, and one for responding to clients.

You can grow each thread pool independently based on your usage requirements.

When you have TLS enabled, Carrier Cache decrypts client communications on incoming data threads and encrypts data to clients on reply threads.

With Carrier Cache's unqiue async thread pool architecture, encryption and decryption operations happen concurrently and don't block each other!

Carrier Cache's unqiue async thread pool architecture also means you can have more reply concurrency than input concurrency. Cached values clients retrieve are usually larger than cache keys, and you can easily allocate more resources to encrypting outbound client responses than to decrypting inbound client requests thereby minimizing response latency and increasing performance.

There are no practical limits on the number of threads you can provision. You can increase thread counts until your operating system complains. The minimum number of threads required is one for each processing level.

Thread Config:

runtime {
    protocolParserInstances [1+]
    dataWorkerInstances [1+]
    networkReplierInstances [1+]
}

Use 122 threads with bias towards replier concurrency:

runtime {
    protocolParserInstances 12
    dataWorkerInstances 72
    networkReplierInstances 38
}

Use 600 threads:

runtime {
    protocolParserInstances 100
    dataWorkerInstances 300
    networkReplierInstances 200
}

Use just one thread per processing level:

runtime {
    protocolParserInstances 1
    dataWorkerInstances 1
    networkReplierInstances 1
}

Multi-Network Capable with Capability Controls

Sorry about the slightly broken formatting.
You can try to turn landscape for a wider view, but
we're working on a layout fix. Thanks!

Carrier Cache can listen for requests across multiple networks. Each network can be configured with unique configuration options including: TLS, read-only data access, read-write data access, allow stats access, enable admin commands.

Network statistics are calculated for each each network individually as well as aggregated into a global summary for all networks combined.

Carrier Cache also includes full support for IPv6.

Carrier Cache has no default port number, so you must always specify a port number for your servers. Carrier Cache rejects having a default port number because globally declaring connection endpoints for internal databases is a huge and unnecessary security risk.

Carrier Cache and Carrier DB support full multi-threaded TLS encryption and decryption. Servers easily handle multiple certificates per host for accommodating both RSA and ECDSA clients based on compatibility levels. Configuration is simple: just list multiple privateKey and certChain sections then keys and certificates will be automatically matched together on server startup regardless of order declared in configuration.

Multi-Network Config:

network [uniqueNetworkName] {
    host [hostname or IP] (default: 127.0.0.1)
    port [port] (default: 0; random port assignment)
    maxSessions [count] (default: unlimited)

    # TLS Options
    enabletls [yes|no] (default: no)
    certChain [path]
    privateKey [path]
    [certChain [path]]
    [privateKey [path]]

    # Simple Access Control Options
    access "admin stats read write" (default: nothing allowed)

    # Limit all clients to a namespace (Carrier DB only)
    namespace "nslock ns my namespace for clients"

    # Execute program when network is available on startup
    # Arguments sent: networkName listenAddress listenPort tls|notls protocol
    notify /path/to/program
    [notifyUser [username]]
    [notifyGroup [groupname]]

    # Explicit Security Overrides for Unsafe Deployments
    securityOverrideAllowPublicIP [yes|no] (default: no)
    securityOverrideListenOnAllInterfaces [yes|no] (default: no)
}

Multi-Network Config Examples:

network localClientAccess {
    host 127.0.0.1
    port 1010

    # TLS Options
    enabletls yes
    certChain /etc/ssl/cert-rsa.pem
    privateKey /etc/ssl/key-rsa.pem
    certChain /etc/ssl/cert-ecdsa.pem
    privateKey /etc/ssl/key-ecdsa.pem

    # Allow regular data access, but no stats
    access "read write"
}

network remoteMonitoringSystemAccess {
    host 4.2.2.2
    port 1020

    # TLS Options
    enabletls yes
    certChain /etc/ssl/cert-rsa.pem
    privateKey /etc/ssl/key-rsa.pem
    certChain /etc/ssl/cert-ecdsa.pem
    privateKey /etc/ssl/key-ecdsa.pem

    # Allow Stats, but disable data access
    access "stats"

    # Because 'host' is a public IP address, we must
    # tell Carrier Cache we want to expose our data
    # to the world
    securityOverrideAllowPublicIP yes
}

network interSiteDRLink {
    host 0.0.0.0
    port 2010

    # TLS Options
    enabletls yes
    certChain /etc/ssl/cert-ecdsa.pem
    privateKey /etc/ssl/key-ecdsa.pem

    # Allow Stats, Allow Data
    access "stats read write"

    # 0.0.0.0 IP for 'host' means Carrier Cache will reject
    # this configuration unless we also specify we accept
    # the responsibility of exposing data on unknown and
    # potentially globally routeable IP addresses
    securityOverrideListenOnAllInterfaces yes
}

LRU

Carrier Cache supports three LRU algorithms each with their own unique advantages. Your LRU algorithm options are:

random — pick random key and delete it
random-with-memory — pick random key, but don't delete if in per-worker list of most active keys
snlru — a true LRU with user-defined number of priority boosting segments

The benefits of each LRU algorithm:

random — zero memory overhead, zero in-flight processing overhead; fast and efficient
random-with-memory — low overhead; minimal processing during requests to maintain
snlru — an exceptionally accurate way of evicting least used keys

The drawbacks of each LRU algorithm:

random — deletes without regard for key hit rate, but still fast and useful
random-with-memory — requires updating fixed records during each data access
snlru — highest memory overhead and highest accounting overhead during requests; snlru memory grows with number of keys stored

Summary:

Algorithm	Space	Speed	Accuracy
random	zero	fastest	average
random-with-memory	fixed; 3 MB per data thread	fast	above average
snlru	linear; grows with each key added	average	exact

LRU Config:

lru {
    enable [yes|no] (default: no)
    mode [random|random-with-memory|snlru] (default: random)
    runBeforeChanges [yes|no] (default: no)
    snlruDepth [1+] (default: 7)
}

LRU enable random-with-memory:

lru {
    enable yes
    mode random-with-memory
}

LRU enable snlru with a default depth of 7 (note: if you set depth 1 then snlru becomes a classical LRU with no level-based boosting:

lru {
    enable yes
    mode snlru
}

LRU enable random and also attempt LRU deletes before writes when server is above memory limit (instead of on regular cleanup timer):

lru {
    enable yes
    mode random
    runBeforeChanges yes
}

Runtime

Carrier Cache supports standard server options under the runtime config:

process {
    background [yes|no] (default: no)
    pidfile [path] (default: none)
}

Where LOGLEVEL is one of: Emergency, Alert, Critical, Error, Warning, Notice, Info, Debug

Example of launching Carrier Cache as a daemon with pid file:

process {
    background yes
    pidfile cache.pid
}

Log level may be changed at runtime using ADMIN LOGLEVEL SET [LEVEL] if your network has admin access.

Logging

Carrier Cache uses an out-of-process logging architecture so writing logs has no impact on the request performance of the cache itself.

Carrier Cache can be configured to log any of stdout, syslog, file:

log {
    level [LOGLEVEL] (default: INFO)
    file [PATH] (default: none)
    syslog (yes|no) (default: no)
    facility [FACILITY] (default: USER)
    ident [STRING] (default: "carrier")
    stdout (yes|no) (default: yes)
}

Where LOGLEVEL is one of:

Emergency, Alert, Critical, Error, Warning, Notice, Info, Debug

Where FACILITY is one of:

AUTH, AUTHPRIV, CRON, DAEMON, FTP, KERN, LPR, MAIL, NEWS, USER, UUCP, LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7

Example of requesting Carrier Cache write debug logs to a file:

log {
    level DEBUG
    stdout no
    file logfile.txt
}

If your network has admin access to Carrier Cache, log levels may be changed at runtime using ADMIN LOGLEVEL SET [LEVEL]. Each logging output (stdout, syslog, file) maintains the same log level and multiple logging outputs may be active simultaneously (e.g. stdout and file at the same time).

The only automatic adjustment to logging is when running Carrier Cache as a background daemon process stdout is forced to no regardless of configuration settings.

Stats For The Modern Age

Carrier Cache provides metrics and stats in easy to parse and easy to read JSON output.

Unlike memcached, Carrier Cache doesn't require writing your own hand-rolled parser to consume statistics and metrics from your server. Just use standard JSON readers and everything works immediately.

Carrier Cache stats command takes the form of:

STATS [[NOEND] [VISUAL]] [SECTION...]

For Example:

STATS
STATS VISUAL
STATS NOEND VISUAL cpu memory

Options mean:

NOEND — don't write END at the end of the output. The memcached protocol requires stats to end with END, but that breaks command line pipe processing. The NOEND option allows you to process STATS output on the command line with tools like jq:
- printf "STATS VISUAL NOEND cpu memory\r\n" |nc localhost [PORT] |jq
- printf "STATS NOEND VISUAL\r\n" |ncat -C --ssl 127.0.0.1 [TLS PORT] |jq
VISUAL — indent and format the JSON output for human reading. By default, the output will be delivered flattened on one line.
SECTION... — sections are one or more of: license, process, cpu, memory, startup, os, log, keyspace, network.

Full STATS output example (comments are here for documentation, they are not present in actual server output):

license

License Details:

owner
unique credential tied back to the owner's license entitlements

{
  "license": {
    "legalOwner": "<your regsitered owner name>",
    "credential": "<per build state tracker>",
    "version": "CC-1.3"
  }
}

process

Process-level details including PIDs, start time and uptime. Note the use of easily human-readable data presented with numeric fields

{
  "process": {
    "pid": {
      "server": 9155,
      "logger": 9156
    },
    "started": {
      "epoch": 1506523134.734523,
      "approx": "Wed Sep 27 14:38:54+0000 2017"
    },
    "uptime": {
      "seconds": 70.639711,
      "approx": "1.1773 minutes"
    },
    "cwd": "/mnt/ramdisk",
    "binary": "./carrier"
  }
}

cpu

CPU usage stats for how long Carrier Cache has been running on CPUs

{
  "cpu": {
    "system": {
      "seconds": 131.34800720214844,
      "approx": "2.1891 minutes"
    },
    "process": {
      "seconds": 36.79199981689453,
      "approx": "36.7920 seconds"
    }
  }
}

memory

Memory Stats for Carrier Cache are a bit more involved than other stats.

Carrier Cache uses three to five different memory allocation systems and all of them are accounted for below.

memory.fromPool is the primary memory pool where user data is allocated. The pool itself has a currently available total size of memory.fromPool.used.total, but only memory.fromPool.used is the amount of memory checked out. The memory pool total size will grow as necessary as more memory is requested from the system. "available" is simply "total" - "used".
Note: memory.fromPool.used.total is memory _already_ requested from the operating system, but the operating system may not actually materialize the memory until it gets written into. You'll see a side effect of this when the total physical bytes used in memory.usage.current.physical (RSS) is _less than_ the pool size.
memory.fromSystem is memory Carrier Cache requested directly from the operating system outside of the memory pool. Certain objects like the inter-thread message queues and inter-process message queues are allocated directly from the system at startup since we know their sizes are constant and will never change. This is also another case where your operating system may not materialize the memory requested immediately, so it's possible total bytes as reported by memory.usage.current.physical (RSS) can be lower than memory.fromSystem + memory.fromPool.requestedFromSystem even though that's the amount of memory already requested.
memory.usage.logical is memory.fromPool.used + memory.fromSystem, but because of system allocator behavior, it may be less than the current physical process size as reported by memory.usage.current.physical. Also note, in the presence of excessive memory fragmentation, the current physical size may also grow more than the current logical size.
memory.usage.highest is the highest value ever seen for memory.usage.current.physical
memory.usage.potential is memory.fromPool.requestedFromSystem + memory.fromSystem
memory.softLimit is the memory limit Carrier Cache will observe. Note it is a _soft_ limit and Carrier Cache may grow above the limit at times. If LRU features are not enabled, your clients will be denied writes when the memory limit is reached. If LRU features are enabled, clients will always be allowed to write and existing memory will be deleted in the background to remain around the requested memory.softLimit. The limit may only be set at startup from your configuration file.
memory.lru details the LRU algorithm chosen and whether the LRU is enabled. LRU options may only be set at startup from your configuration file.

{
  "memory": {
    "fromPool": {
      "requestedFromSystem": {
        "bytes": 784973824,
        "approx": "748.60938 MiB"
      },    
      "used": {
        "bytes": 475614408,
        "approx": "453.58125 MiB"
      },
      "available": {
        "bytes": 309359416,
        "approx": "295.02813 MiB"
      }
    },
    "fromSystem": {
      "bytes": 32552760,
      "approx": "31.04473 MiB"
    },
    "usage": {
      "current": {
        "logical": {
          "bytes": 508167168,
          "approx": "484.62598 MiB"
        },
        "physical": {
          "bytes": 519356416,
          "approx": "495.29688 MiB"
        },
        "potential": {
          "bytes": 817526584,
          "approx": "779.65411 MiB"
        }
      },
      "highest": {
        "bytes": 520212480,
        "approx": "496.11328 MiB"
      }
    },
    "softLimit": {
      "bytes": 2199023255552,
      "approx": "2.00000 TiB"
    },
    "lru": {
      "enabled": true,
      "mode": "snlru"
    }
  }
}

startup

Startup saves details of Carrier Cache recorded during startup phases.

startup.cores is the number of cores as reported by your operating system to Carrier Cache. Heuristics use this value to auto-determine thread pool counts in the absence of a user-specified thread topology.
startup.threadCount is your running thread topology at each processing level.
startup.log records details of the secondary process logger including its custom allocated message passing heap and independent inter-process message queue.
startup.interthreadMsgQ is derived from your configured runtime {msgQMultiplier N} settings. Initial message queue size is 4096 per thread and you can grow it by powers of 2 from there.
startup.objectPools is derived from your configured object pool count from:
```
memory {
    precacheMultiplier  5
}
```
startup.checkpointDuringStartup is the physical size of Carrier Cache at multiple stages throughout the startup process. You can use these checkpoint records to help adjust how high to set your precache and message queue multipliers.

{
  "startup": {
    "cores": {
      "detected": 128,
      "assumed": 128
    },
    "threadCount": {
      "parse": 16,
      "work": 101,
      "reply": 16
    },
    "log": {
      "heap": {
        "bytes": 1048576,
        "approx": "1.00000 MiB"
      },
      "interprocessMsgQCapacity": 8192
    },
    "interthreadMsgQ": {
      "multiplier": 0,
      "2^": 12,
      "capacity": 4096
    },
    "objectPools": {
      "multiplier": 5,
      "capacity": 50
    },
    "checkpointDuringStartup": {
      "initial": {
        "bytes": 5353472,
        "approx": "5.10547 MiB"
      },
      "afterCachePoolInit": {
        "bytes": 286044160,
        "approx": "272.79297 MiB"
      },
      "afterMsgQInit": {
        "bytes": 323301376,
        "approx": "308.32422 MiB"
      }
    }
  }
}

os

Basic OS details useful for debugging and support requests

{
  "os": {
    "ulimit": {
      "files": {
        "requested": 537,
        "actual": 1024,
        "status": ""
      }
    },
    "system": {
      "sys": "Linux",
      "kernel": "4.4.0-1022-aws",
      "arch": "x86_64",
      "node": "ip-172-31-5-232",
      "memory": {
        "bytes": 2063091798016,
        "approx": "1.87637 TiB"
      }
    },
    "thp": {
      "enabled": true
    }
  }
}

log

Current logging configuration. Log level may be changed at runtime if you have access "admin" configured for your network by running ADMIN LOGLEVEL SET [level]

{
  "log": {
    "level": "INFO",
    "file": "",
    "stdout": true,
    "syslog": false
  }
}

keyspace

Keyspace details

keyspace.count.current is the number of keys currently resident in memory.
keyspace.count.total is the total number of keys to ever have existed regardles of deletions or LRU evictions. (Note: a key being overwritten is not counted as new)
keyspace.hit is the total number of times a client command found a key
keyspace.miss is the total number of times a client command asked for a keyspace but didn't find one.

{
  "keyspace": {
    "count": {
      "current": 2122892,
      "total": 2122892
    },
    "hit": 0,
    "miss": 0
  }
}

network

Network details are presented per-user-configured-network. network.[name].clients.current is the total number of connected clients to [name] network.[name].clients.total is the total number of clients ever connected to [name] network.[name].role provides details of the access restrictions set by your configuration network.[name].input is the total number of bytes received by [name] network.[name].output is the total number of bytes sent by [name]

{
  "network": {
    "regularLocalConnections": {
      "tls": false,
      "host": "127.0.0.1",
      "port": 7778,
      "clients": {
        "current": 1,
        "limit": 0,
        "total": 3
      },
      "role": {
        "admin": true,
        "stats": true,
        "data": {
          "read": true,
          "readonly": false
        }
      },
      "input": {
        "bytes": 110143014,
        "approx": "105.04056 MiB"
      },
      "output": {
        "bytes": 2421,
        "approx": "2.36426 KiB"
      }
    }
  }
}

networkTotal

networkTotal is the total network activity through this server since it started. This section is most useful when you have multiple user-defined cache serving networks each with their own individual network statistics. This section gives you a quick summary of server traffic without needing add up each individual network input/output figure yourself.

{
  "networkTotal": {
    "input": {
      "bytes": 110143014,
      "approx": "105.04056 MiB"
    },
    "output": {
      "bytes": 2421,
      "approx": "2.36426 KiB"
    },
    "clients": {
      "current": 1,
      "total": 3
    }
  }
}

Command Line Arguments

Carrier Cache allows a limited set of configuration options from the command line.

From the command line you can define a simple network using:

--host [hostname or ip] — IP to listen on (also accepted as -h, -l, --listen, -b, or --bind)
--port [port] — port number (also accepted as -p)
--tls — enable TLS (also accepted as -t)
--cert [filename] — TLS certificate chain
--key [filename] — TLS private key for certificate

Note: the command line network will have no stats or admin access. Production setups should define networks in a configuration file.

Loading a configuration file into Carrier Cache happens by:

--config [filename] — also accepted as -c

Because Carrier Cache is an in-memory cache server, the speed of your memory is of paramount importance to the performance of your cache. You can run a diagnostic memory speed test using:

--speed — test memory speed by writing 300 MB for 300 iterations then exit
--speed [MB] — run a [MB] MB test then exit

If your memory is fast, your cache will be fast too. Be careful about running on over-provisioned or noisy neighbor cloud hosts where memory speeds can fluctuate wildly. Cloud hosts are also notorious for poor memory performance. A modern non-cloud server can easily get over 16 gigabytes per second of memory throughput, while a top of the line "cloud server" may only deliver 5 gigabytes per second of memory throughput (around the performance of a server from 2009). Test your provider and see how they measure up.

Ready Now?

Carrier Cache

The world's most efficient caching platform is waiting for you.

Grab a license or six today

Subscribe to receive announcements, release notes, feature updates, progress reports, and maybe even some surprises along the way.