Lexicographic encoding of floating-point numbers — a single-function JavaScript implementation of section 6 of Peter Seymour's 2008 paper.
Reference paper: Efficient Lexicographic Encoding of Numbers (Peter Seymour, 2008).
Encodes a finite-expansion decimal as a string whose lexicographic (dictionary) order matches the numeric order of the value. Sort the encoded strings as text and you get the numbers in numeric order — useful any time text is the only ordering primitive available, such as keys in a sorted store.
Every non-zero number is written normalised as f = 10e × ±m with mantissa m ∈ [1/10, 1). The encoded string is sign · exponent · mantissa, with the mantissa-sign moved to the front, the exponent length-prefixed (so it is self-delimiting), and a terminator closing the mantissa. Negative numbers also negate the exponent and invert each digit (d → 9-d) so reversed numeric order becomes forward lex order.
The scheme needs two reserved characters — one that sorts before all digits (negative-side marker) and one that sorts after all digits (positive-side marker). The paper writes them as - and +, but in ASCII/UTF-16 those characters actually sort before the digits (both are below 0x30), so a plain string sort would not give numeric order. This implementation defaults to / (0x2F, just before 0) and : (0x3A, just after 9) so the encoded strings sort correctly under default string comparison. You can override with any pair: elen('encode', value, neg, pos).
Accepts a decimal string. Sign optional (defaults to +). No scientific notation.
The middle column is the paper’s notation (using + and -). The right column is the default output of this implementation (using : and /) — identical structure, just remapped so default string sort gives numeric order.
| Input | Paper (+ / -) | This impl (: / /) |
|---|---|---|
| 0 | 0 | 0 |
| +1 | ++11- | ::11/ |
| +1.3 | ++113- | ::113/ |
| +1.4 | ++114- | ::114/ |
| +0.123 | +0123- | :0123/ |
| +0.0123 | +-8123- | :/8123/ |
| +0.00123 | +-7123- | :/7123/ |
| +0.001233 | +-71233- | :/71233/ |
| −1 | --88+ | //88: |
| −1.4 | --885+ | //885: |
| −0.123 | -0876+ | /0876: |
| −0.0123 | -+1876+ | /:1876: |
| −0.00123 | -+2876+ | /:2876: |
One self-contained function, elen(mode, value), with mode being "encode" or "decode". Pure string and integer arithmetic, no Number conversion of the actual value, so 1000-digit inputs round-trip losslessly.