ScriptinScript value representation

As part of my long-running side quest to make a safe, usable environment for user-contributed scripted widgets for Wikipedia and other web sites, I’ve started working on ScriptinScript, a modern JavaScript interpreter written in modern JavaScript.

It’ll be a while before I have it fully working, as I’m moving from a seat-of-the-pants proof of concept into something actually based on the language spec… After poking a lot at the spec details of how primitives and objects work, I’m pretty sure I have a good idea of how to represent guest JavaScript values using host JavaScript values in a safe, spec-compliant way.

Primitives

JavaScript primitive types — numbers, strings, symbols, null, and undefined — are suitable to represent themselves; pretty handy! They’re copyable and don’t expose any host environment details.

Note that when you do things like reading str.length or calling str.charCodeAt(index) per spec it’s actually boxing the primitive value into a String object and then calling a method on that! The primitive string value itself has no properties or methods.

Objects

Objects, though. Ah now that’s tricky. A JavaScript object is roughly a hash map of properties indexed with string or symbol primitives, plus some internal metadata such as a prototype chain relationship with other objects.

The prototype chain is similar, but oddly unlike, class-based inheritance typical in many other languages.

Somehow we need to implement the semantics of JavaScript objects as JavaScript objects, though the actual API visible to other script implementations could be quite different.

First draft: spec-based

My initial design modeled the spec behavior pretty literally, with prototype chains and property descriptors to be followed step by step in the interpreter.

Guest property descriptors live as properties of a this.props sub-object created with a null prototype, so things on the host Object prototype or the custom VMObject wrapper class don’t leak in.

If a property doesn’t exist on this.props when looking it up, the interpreter will follow the chain down through this.Prototype. Once a property descriptor is found, it has to be examined for the value or get/set callables, and handled manually.

// VMObject is a regular class
[VMObject] {
    // "Internal slots" and implementation details
    // as properties directly on the object
    machine: [Machine],
    Prototype: [VMObject] || null,

    // props contains only own properties
    // so prototype lookups must follow this.Prototype
    props: [nullproto] {
        // prop values are virtual property descriptors
        // like you would pass to Object.defineProperty()
        aDataProp: {
            value: [VMObject],
            writable: true,
            enumerable: true,
            configurable: true,
        },
        anAccessorProp: {
            get: [VMFunction],
            set: [VMFunction],
            enumerable: true,
            configurable: true,
        },
    },
}

Prototype chains

Handling of prototype chains in property lookups can be simplified by using native host prototype chains on the props object that holds the property descriptors.

Instead of Object.create(null) to make props, use Object.create(this.Prototype ? this.Prototype.props : null).

The object layout looks about the same as above, except that props itself has a prototype chain.

Property descriptors

We can go a step further, using native property descriptors which lets us model property accesses as direct loads and stores etc.

Object.defineProperty can be used directly on this.props to add native property descriptors including support for accessors by using closure functions to wrap calls into the interpreter.

This should make property gets and sets faster and awesomer!

Proper behavior should be retained as long as operations that can affect property descriptor handling are forwarded to props, such as calling Object.preventExtensions(this.props) when the equivalent guest operation is called on the VMObject.

Native objects

At this point, our inner props object is pretty much the “real” guest object, with all its properties and an inheritance chain.

We could instead have a single object which holds both “internal slots” and the guest properties…

let MachineRef = Symbol('MachineRef');

// VMObject is prototyped on a null-prototype object
// that does not descend from host Object, and which
// is named 'Object' as well from what guest can see.
// Null-proto objects can also be used, as long as
// they have the marker slots.
let VMObject = function Object(val) {
    return VMObject[MachineRef].ToObject(val);
};
VMObject[MachineRef] = machine;
VMObject.prototype = Object.create(null);
VMObject.prototype[MachineRef] = machine;
VMObject.prototype.constructor = VMObject;

[VMObject] || [nullproto] {
    // "Internal slots" and implementation details
    // as properties indexed by special symbols.
    // These will be excluded from enumeration and
    // the guest's view of own properties.
    [MachineRef]: [Machine],

    // prop values are stored directly on the object
    aDataProp: [VMObject],
    // use native prop descriptors, with accessors
    // as closures wrapping the interpreter.
    get anAccessorProp: [Function],
    set anAccessorProp: [Function],
}

The presence of the symbol-indexed [MachineRef] property tells host code in the engine that a given object belongs to the guest and is safe to use — this should be checked at various points in the interpreter like setting properties and making calls, to prevent dangerous scenarios like exposing the native Function constructor to create new host functions, or script injection via DOM innerHTML properties.

Functions

There’s an additional difficulty, which is function objects.

Various properties will want to be host-callable functions — things like valueOfand toString. You may also want to expose guest functions directly to host code… but if we use VMObject instances for guest function objects, then there’s no way to make them directly callable by the host.

Function re-prototyping

One possibility is to outright represent guest function objects using host function objects! They’d be closures wrapping the interpreter, and ‘just work’ from host code (though possibly careful in how they accept input).

However we’d need a function object that has a custom prototype, and there’s no way to create a function object that way… but you can change the prototype of a function that already has been instantiated.

Everyone says don’t do this, but you can. ;)

let MachineRef = Symbol('MachineRef');

// Create our own prototype chain...
let VMObjectPrototype = Object.create(null);
let VMFunctionPrototype = Object.create(VMObjectPrototype);

function guestFunc(func) {
    // ... and attach it to the given closure function!
    Reflect.setPrototypeOf(func, VMFunction.prototype);

    // Also save our internal marker property.
    func[MachineRef] = machine;
	return func;
}

// Create our constructors, which do not descend from
// the host Function but rather from VMFunction!
let VMObject = guestFunc(function Object(val) {
    let machine = VMObject[MachineRef];
    return machine.ToObject(val);
});

let VMFunction = guestFunc(function Function(src) {
    throw new Error('Function constructor not yet supported');
});

VMFunction.prototype = VMFunctionPrototype;
VMFunctionPrototype.constructor = VMFunction;

VMObject.prototype = VMObjectPrototype;
VMObjectPrototype.constructor = VMObject;

This seems to work but feels a bit … freaky.

Function proxying

An alternative is to use JavaScript’s Proxy feature to make guest function objects into a composite object that works transparently from the outside:

let MachineRef = Symbol('MachineRef');

// Helper function to create guest objects
function createObj(proto) {
    let obj = Object.create(proto);
    obj[MachineRef] = machine;
    return obj;
}

// We still create our own prototype chain...
let VMObjectPrototype = createObj(null);
let VMFunctionPrototype = createObj(VMObjectPrototype);

// Wrap our host implementation functions...
function guestFunc(func) {
    // Create a separate VMFunction instance instead of
    // modifying the original function.
    //
    // This object is not callable, but will hold the
    // custom prototype chain and non-function properties.
    let obj = createObj(VMFunctionPrototype);

    // ... now wrap the func and the obj together!
    return new Proxy(func, {
        // In order to make the proxy object callable,
        // the proxy target is the native function.
        //
        // The proxy automatically forwards function calls
        // to the target, so there's no need to include an
        // 'apply' or 'construct' handler.
        //
        // However we have to divert everything else to
        // the VMFunction guest object.
        defineProperty: function(target, key, descriptor) {
            if (target.hasOwnProperty(key)) {
                return Reflect.defineProperty(target, key, descriptor);
            }
            return Reflect.defineProperty(obj, key, descriptor);
        },
        deleteProperty: function(target, key) {
            if (target.hasOwnProperty(key)) {
                return Reflect.deleteProperty(target, key);
            }
            return Reflect.deleteProperty(obj, key);
        },
        get: function(target, key) {
            if (target.hasOwnProperty(key)) {
                return Reflect.get(target, key);
            }
            return Reflect.get(obj, key);
        },
        getOwnPropertyDescriptor: function(target, key) {
            if (target.hasOwnProperty(key)) {
                return Reflect.getOwnPropertyDescriptor(target, key);
            }
            return Reflect.getOwnPropertyDescriptor(obj, key);
        },
        getPrototypeOf: function(target) {
            return Reflect.getPrototypeOf(obj);
        },
        has: function(target, key) {
            if (target.hasOwnProperty(key)) {
                return Reflect.has(target, key);
            }
            return Reflect.has(obj, key);
        },
        isExtensible: function(target) {
            return Reflect.isExtensible(obj);
        },
        ownKeys: function(target) {
            return Reflect.ownKeys(target).concat(
                Reflect.ownKeys(obj)
            );
        },
        preventExtensions: function(target) {
            return Reflect.preventExtensions(target) &&
                Reflect.preventExtensions(obj);
        },
        set: function(target, key, val, receiver) {
            if (target.hasOwnProperty(key)) {
                return Reflect.set(target, key, val, receiver);
            }
            return Reflect.set(obj, key, val, receiver);
        },
        setPrototypeOf: function(target, proto) {
            return Reflect.setPrototypeOf(obj, proto);
        },
    });
}

// Create our constructors, which now do not descend from
// the host Function but rather from VMFunction!
let VMObject = guestFunc(function Object(val) {
    // The actual behavior of Object() is more complex ;)
    return VMObject[MachineRef].ToObject(val);
});

let VMFunction = guestFunc(function Function(args, src) {
    // Could have the engine parse and compile a new guest func...
    throw new Error('Function constructor not yet supported');
});

// Set up the circular reference between
// the constructors and protoypes.
VMFunction.prototype = VMFunctionPrototype;
VMFunctionPrototype.constructor = VMFunction;
VMObject.prototype = VMObjectPrototype;
VMObjectPrototype.constructor = VMObject;

There’s more details to work out, like filling out the VMObject and VMFunction prototypes, ensuring that created functions always have a guest prototype property, etc.

Note that implementing the engine in JS’s “strict mode” means we don’t have to worry about bridging the old-fashioned arguments and caller properties, which otherwise couldn’t be replaced by the proxy because they’re non-configurable.

My main worries with this layout are that it’ll be hard to tell host from guest objects in the debugger, since the internal constructor names are the same as the external constructor names… the [MachineRef] marker property should help though.

And secondarily, it’s easier to accidentally inject a host object into a guest object’s properties or a guest function’s arguments…

Blocking host objects

We could protect guest objects from injection of host objects using another Proxy:

function wrapObj(obj) {
    return new Proxy(obj, {
        defineProperty: function(target, key, descriptor) {
            let machine = target[MachineRef];
            if (!machine.isGuestVal(descriptor.value) ||
                !machine.isGuestVal(descriptor.get) ||
                !machine.isGuestVal(descriptor.set)
            ) {
                throw new TypeError('Cannot define property with host object as value or accessors');
            }
            return Reflect.defineProperty(target, key, descriptor);
        },
        set: function(target, key, val, receiver) {
            // invariant: key is a string or symbol
            let machine = target[MachineRef];
            if (!machine.isGuestVal(val)) {
                throw new TypeError('Cannot set property to host object');
            }
            return Reflect.set(target, key, val, receiver);
        },
        setPrototypeOf: function(target, proto) {
            let machine = target[MachineRef];
            if (!machine.isGuestVal(val)) {
                throw new TypeError('Cannot set prototype to host object');
            }
            return Reflect.setPrototypeOf(obj, proto);
        },
    };
}

This may slow down access to the object, however. Need to benchmark and test some more and decide whether it’s worth it.

For functions, can also include the `apply` and `construct` traps to check for host objects in arguments:

function guestFunc(func) {
    let obj = createObj(VMFunctionPrototype);
    return new Proxy(func, {
        //
        // ... all the same traps as wrapObj and also:
        //
        apply: function(target, thisValue, args) {
            let machine = target[MachineRef];
            if (!machine.isGuestVal(thisValue)) {
                throw new TypeError('Cannot call with host object as "this" value');
            }
            for (let arg of args) {
                if (!machine.isGuestVal(arg)) {
                    throw new TypeError('Cannot call with host object as argument');
                }
            }
            return Reflect.apply(target, thisValue, args);
        },
        construct: function(target, args, newTarget) {
            let machine = target[MachineRef];
            for (let arg of args) {
                if (!machine.isGuestVal(arg)) {
                    throw new TypeError('Cannot construct with host object as argument');
                }
            }
            if (!machine.isGuestVal(newTarget)) {
                throw new TypeError('Cannot construct with host object as new.target');
            }
            return Reflect.apply(target, args, newTarget);
        },
    });
}

Exotic objects

There are also “exotic objects”, proxies, and other funky things like Arrays that need to handle properties differently from a native object… I’m pretty sure they can all be represented using proxies.

Next steps

I need to flesh out the code a bit more using the new object model, and start on spec-compliant versions of interpreter operations to get through a few simple test functions.

Once that’s done, I’ll start pushing up the working code and keep improving it. :)

Update (benchmarks)

I did some quick benchmarks and found that, at least in Node 11, swapping out the Function prototype doesn’t appear to harm call performance while using a Proxy adds a fair amount of overhead to short calls.

$ node protobench.js 
 empty in 22 ms
 native in 119 ms
 guest in 120 ms

$ node proxybench.js 
 empty in 18 ms
 native in 120 ms
 guest in 1075 ms

This may not be significant when functions have to go through the interpreter anyway, but I’ll consider whether the proxy is needed and weigh the options…

Update 2 (benchmarks)

Note that the above benchmarks don’t reflect another issue — de-optimization of call sites that accept user-provided callbacks, if you sometimes pass them regular functions and other times pass them re-prototyped or proxied objects, they can switch optimization modes and end up slightly slower also when passed regular functions.

If you know you’re going to pass a guest object into a separate place that may be interchangeable with a native host function, you can make a native wrapper closure around the guest call and it should avoid this.