Party Like It's 1992

April 7, 2004

I've been using CAPICOM at work. Since most COM objects are supposed to work with VB, the string values returned by COM functions (in my case CAPICOM::Certificate.Export()) have some bizarre and baroque semantics when called from C++. One quirk I found particularly amusing was the memory allocation behind BSTRs; here's what "Eric's Complete Guide to BSTR Semantics" has to say about what's happening under the hood for BSTRs:

COM code uses the BSTR to store a Unicode string, short for "Basic String". (So called because this method of storing strings was developed for OLE Automation, which was at the time motivated by the development of the Visual Basic language engine.)

...

  1. If you write a function which takes an argument of type BSTR then you are required to accept NULL as a valid BSTR and treat it the same as a pointer to a zero-length BSTR. COM uses this convention, as does Visual Basic and VBScript, so if you want to play well with others you have to obey this convention. If a string variable in VB happens to be an empty string then VB might pass it as NULL or as a zero-length buffer -- it is entirely dependent on the internal workings of the VB program.
  2. BSTRs are always allocated and freed with SysAllocString, SysAllocStringLen, SysFreeString and so on. The underlying memory is cached by the operating system and it is a serious, heap-corrupting error to call "free" or "delete" on a BSTR. Similarly it is also an error to allocate a buffer with "malloc" or "new" and cast it to a BSTR. Internal operating system code makes assumptions about the layout in memory of a BSTR which you should not attempt to simulate.
  3. The number of characters in a BSTR is fixed. A ten-byte BSTR contains five Unicode characters, end of story.
  4. A BSTR always points to the first valid character in the buffer. This is not legal:

    
    BSTR bstrName = SysAllocString(L"John Doe");
    BSTR bstrLast = &bstrName[5]; // ERROR
    
    

    bstrLast is not a legal BSTR

....

When you call SysAllocString(L"ABCDE") the operating system actually allocates sixteen bytes. The first four bytes are a 32 bit integer representing the number of valid bytes in the string -- initialized to ten in this case. The next ten bytes belong to the caller and are filled in with the data passed in to the allocator. The final two bytes are filled in with zeros. You are then given a pointer to the data, not to the header.

(Emphasis is mine)

Strings with a length prefix and a double-NULL suffix. Now that's what I call efficient use of memory! Seriously though, this is like some sort of programming time warp; it reminds me of both the Pascal-induced single-byte length prefix strings the Mac Toolbox calls used and the associated (and equally wacky) string-conversion functions. Ah, history.