elen

Lexicographic encoding of floating-point numbers — a single-function JavaScript implementation of section 6 of Peter Seymour's 2008 paper.

Reference paper: Efficient Lexicographic Encoding of Numbers (Peter Seymour, 2008).

What it does

Encodes a finite-expansion decimal as a string whose lexicographic (dictionary) order matches the numeric order of the value. Sort the encoded strings as text and you get the numbers in numeric order — useful any time text is the only ordering primitive available, such as keys in a sorted store.

Every non-zero number is written normalised as f = 10e × ±m with mantissa m ∈ [1/10, 1). The encoded string is sign · exponent · mantissa, with the mantissa-sign moved to the front, the exponent length-prefixed (so it is self-delimiting), and a terminator closing the mantissa. Negative numbers also negate the exponent and invert each digit (d → 9-d) so reversed numeric order becomes forward lex order.

The scheme needs two reserved characters — one that sorts before all digits (negative-side marker) and one that sorts after all digits (positive-side marker). The paper writes them as - and +, but in ASCII/UTF-16 those characters actually sort before the digits (both are below 0x30), so a plain string sort would not give numeric order. This implementation defaults to / (0x2F, just before 0) and : (0x3A, just after 9) so the encoded strings sort correctly under default string comparison. You can override with any pair: elen('encode', value, neg, pos).

Live demo

Accepts a decimal string. Sign optional (defaults to +). No scientific notation.

Test vectors

The middle column is the paper’s notation (using + and -). The right column is the default output of this implementation (using : and /) — identical structure, just remapped so default string sort gives numeric order.

InputPaper (+ / -)This impl (: / /)
0 0 0
+1 ++11- ::11/
+1.3 ++113- ::113/
+1.4 ++114- ::114/
+0.123 +0123- :0123/
+0.0123 +-8123- :/8123/
+0.00123 +-7123- :/7123/
+0.001233 +-71233- :/71233/
−1 --88+ //88:
−1.4 --885+ //885:
−0.123 -0876+ /0876:
−0.0123 -+1876+ /:1876:
−0.00123 -+2876+ /:2876:

The function

One self-contained function, elen(mode, value), with mode being "encode" or "decode". Pure string and integer arithmetic, no Number conversion of the actual value, so 1000-digit inputs round-trip losslessly.