VSPy #1 – A basic language service

This is the first post in a series adding Python support to Visual Studio. See the table of contents for a brief overview.


The very first step is to install Visual Studio 2008 Standard or Professional (the Express editions are not suitable) and the SDK. (I assume everyone following along is capable of doing this without specific instructions.) When I refer to the install directory of the SDK, I will use $(VSSDK90Install), which is the same macro that the SDK sets inside VS.

Once you’re all set up, you’ll find a set of new project options. Under “Other Project Types\Extensibility”, we want to start a new “Visual Studio Integration Package”. The wizard allows you to select your language (I’ll be using C#) and to specify some details.

Visual Studio Integration Package wizard step 2

The following steps allow you to automatically generate various functionality, none of which is essential. Personally I also uncheck the sets of tests that are available. This generates an empty package, which, when executed, will start an isolated instance of VS with your package loaded. (This “experimental hive” stores all settings in a different location from your main instance, so the first execution will need to be as administrator.)

Before we get too far into coding, Microsoft provide a handy set of base classes known as Managed Babel. These can be found in $(VSSDK90Install)VisualStudioIntegration\Common\Source\CSharp\Babel. Add all of these files to a Babel subfolder in your project, except for Package.cs (some of the later tasks will make it easier to use a central package file, rather than the Babel one). You’ll also need references to Microsoft.VisualStudio.Package.LanguageService.9.0.dll and Microsoft.VisualStudio.TextManager.Interop.8.0.dll.

Into another subfolder (I called it LanguageService), add a new class Babel.LanguageService that inherits from Babel.BabelLanguageService, give it a GuidAttribute and implement GetFormatFilterList(). This method returns a string containing the set of files that this language service should handle. For Python, “Python File (*.py)\n*.py” is sufficient (note the newline termination, rather than the vertical bar used by the common dialog controls).

Before we can get even a basic build working, we need to provide a scanner and parser for Python. My versions for this step are here. Extract the files into your LanguageService folder and add them to your project by manually editing the csproj file and adding the following element:

<ItemGroup>
    <MPLexCompile Include=”LanguageService\lexer.lex” />
    <MPPGCompile Include=”LanguageService\parser.y” />
    <Compile Include=”LanguageService\Configuration.cs” />
    <Compile Include=”LanguageService\ErrorHandler.cs” />
    <Compile Include=”LanguageService\LexDefs.cs” />
    <Compile Include=”LanguageService\LanguageService.cs” />
    <Compile Include=”LanguageService\Resolver.cs” />
</ItemGroup>

The first two children associate the grammar files with the lexer and parser generators, while the rest of the files are added normally. (The next post will have a fuller explanation of these files. For now, we are simply getting the package started. Both the scanner and parser have some minor issues that will be fixed)

At this point, we should be able to build successfully. However, we are not ready to begin testing. The basic language service is available but is not yet exposed through the package. Exposing a language service is a case of setting the following attributes:

[ProvideService(typeof(Babel.LanguageService))]
[ProvideLanguageExtension(typeof(Babel.LanguageService), ".py")]
[ProvideLanguageService(typeof(Babel.LanguageService), "Python", 0)]

and adding the following code to the already-overridden Initialize method:

var serviceContainer = this as IServiceContainer;
var langService = new Babel.LanguageService();
langService.SetSite(this);
serviceContainer.AddService(typeof(Babel.LanguageService), langService, true);

At this stage, it is simply a copy-paste job. In time I will cover the purpose of these sections more thoroughly, but for now we should have a working language service that highlights eight keywords (listed near the start of lexer.lex), strings and numbers in a Python file. You’ll need your own test file to open, since we don’t have any support for starting a new .py file (yet).

Next post I’ll go through this version of the lexer and parser, point out some of the shortcuts and shortcomings and finish highlighting the rest of the keywords (in a more efficient way than what I have done already).

February 7, 2010 • Tags: , • Posted in: Uncategorized • No Comments

Adding new languages to Visual Studio

Recently I’ve been playing with the Visual Studio 2008 SDK. This SDK supports a lot of functionality, even allowing you to create and distribute independent applications based on the VS2008 IDE (titled “isolated” mode).

One of the primary use cases for the SDK is implementing domain-specific languages. An example of one of these is the class diagram view, which shows a visual representation that is converted to plain code (I personally use this view for documentation purposes only, but it is possible to add elements and relationship to classes using it). However, the use that has my interest is supporting new languages, specifically, Python.

VS2008 supports at least ten different editors (VB, C#, C++, binary, HTML, resource files, XML, XSLT, XSD, XAML) each of which has its own syntax validation and highlighting requirements. Providing this support comes through language services. Depending on installation settings, a variety of project types and languages are available. These come from templates, project nodes and factories. Debugging support is provided for native and managed code by debugging engines.

To fully implement a new language in this context, quite a large amount of work is required. At this stage, I’ve done about half of what I intend, so it’s time to start writing blog entries. This entry will be the contents page with links to each other page, once they are written.

My tentative plan is:

  1. A basic language service
  2. Syntax highlighting
  3. Hierarchical parsing
  4. Bracket matching
  5. Name completion
  6. Parameter information
  7. New projects
  8. Project properties
  9. Start the project (without debugging)
  10. Start the project (with debugging)
  11. Breakpoints
  12. Deployment

As I have only actually completed seven of these (the seven easiest, not the first seven) and this project isn’t my highest priority, there may not be one post per week. If I miss a week, I’ll try and rant about something to fill the space.

February 7, 2010 • Tags: , , • Posted in: Uncategorized • No Comments

How To Use Design Patterns Correctly

Short answer: as comments.

The standard definition of a design pattern is “a general reusable solution to a commonly occurring problem in software design,” but a better definition is “a good title with a useful description and a pile of other rubbish.”

Take “Singleton” for example. A singleton is a class that has one instance which is used for all references to that class. That’s all you need. The noun, singleton, and the definition. Everything else simply gets in the way of programming. Singletons don’t need diagrams, books or university subjects. The one line definition is sufficient.

What about “Factory”? A factory is a way to create an object that isn’t a constructor. “Constructor” is an instance method that is always called during its object’s instantiation. And here’s the example that makes everything clear.

“Constructor” isn’t a design pattern anymore. It’s been absorbed so thoroughly into development languages and tools that it’s simply a word with a definition. If you mention “constructor” to someone who knows what it means, they will understand. They don’t need to look up their notes from university to find a class diagram explaining it. If you mention “singleton” or “factory” to the same person, they should be able to understand what you mean without using a reference.

(Oh wait, what about the different type of factories? You mean like default constructors, copy constructors, move constructors [coming soon to a C++0x near you] and static constructors?)

Design patterns provide a common vocabulary and occasionally an example. Which is exactly what you find if you look up constructors on MSDN.

Elevating design patterns above the level given to classes, constructors, operators and other language concepts leads to such unproductive ideas as class MySingleton or worse, template<typename T> class Singleton. Anything more than a brief // VectorCreator is a factory for Vector objects is just unnecessary.

January 24, 2010 • Tags: , , • Posted in: Thinking Out Loud • Comments Off

Teaching Good Interface Design

We all know just how hard it is to design good interfaces. Conflicting issues of performance, extensibility, readability and abstraction all influence the final design, and poor decisions early on can continue to cause issues long into the future (for example, the Windows API). The importance of finding a suitable balance is often understated, in favour of drawing diagrams that simplify the interfaces to a solid line.

So where does interface design fit into the teaching process? Software engineering typically involves learning the mechanics of programming languages, architecture design and implementation paradigms. The end result is a C++ programmer who can invent a bunch of components and implement them as classes, complete with getter and setter methods on every instance variable.

That last comment makes it seem obvious that interface design belongs with implementation paradigms. That, however, doesn’t work. Interface design is not unique to object-oriented programming (which is [still!] unjustifiably dominant [from an education perspective]) nor any other paradigm. Interface design is unique to anything that has humans involved, which, for the foreseeable future, includes all programming methodologies.

So what about architecture design? An area currently filled with various patterns and methods for finding objects from requirements and determining who is allowed to talk to who. Much like arranging seats at a dinner party to try and prevent fights. Except, at a dinner party, you know to introduce yourself, wait politely for the other person to finish talking to someone else and avoid asking about whether their nasty… err… infection has cleared up. Seating plans leave this out because its too low level – much the same reason that architecture design leaves it out (in favour of fancy drawings, usually).

Therefore, by process of elimination, interface design belongs with learning the language. Except that interfaces make no sense without some knowledge of architecture. And the implementation paradigm will determine the mechanics of the interface. Clearly, designing a usable, understandable and performant interface is a subject area in itself.

Is it worth a whole subject (24 hours of lecture, 12-24 hours of lab, 3 hour of exam)? What material would be included? How would the assessment of such highly subjective and speculative designs be valid?

Interface design isn’t so much a case of “do it this way”. The approach required is to ensure students understand the implications of their interfaces before the complaints start flowing in. Plenty of examples and alternatives for real designs.

Windowing toolkits (Windows Forms, MFC, wxWidgets, Qt, etc.) are ideal examples for basically every aspect of interface design. They are always packed with far too many classes, and most provide plenty of other facilities that go well beyond the scope of a “windowing” toolkit. The underlying operating system has a window model that is abstracted by the toolkit, providing ample discussion of suitable levels of abstraction. Source code generators can be advantaged by consistent method signatures that might be counter-intuitive to human developers. Layout managers depend on polymorphism, but not every control has a position or size.

Math libraries (Intel MKL, ACML, etc.) provide plenty of examples of single-instruction-multiple-data support and cases where passing arrays to methods performs far more efficiently than single values. Compatibility between the math library and the code using the results can be heavily influenced by the design of library-specific data types.

As for assessment, a subject like this demands what are effectively comprehension tests. Provide a code snippet (C++ header file) or API documentation and a set of questions (eg. “Which method will make a button invisible?” “Should Mutex::Release() be called immediately before Mutex::Destroy()?”). Include too many questions to be answered in the time (for most people) to encourage speed. Run at least two of these during the subject (including the exam) or run a 5-minute one each week. In such a short time ten questions would be enough.

Throw in a design test to make a simpler wrapper for a provided class. Hand the submissions out the next week as a quick comprehension task and then as a longer analysis task – get each student to analyse (anonymously) someone else’s interface and rate it. Reward interfaces that are easily and quickly understood, penalise those that confuse the analyst.

The most important aim of the subject would be to ensure that students have had exposure to a range of both good and bad interfaces and are able to determine for themselves how intuitive, and to a lesser extent performant, an interface will be. Designing for the system is important, but designing for humans is the aim of the game.

January 16, 2010 • Tags: , , , • Posted in: Uncategorized • Comments Off

Secure Erase

Since I’ve made it to Sunday night without a post, I’m reproducing one of my older efforts that wasn’t migrated over. It was originally written in 2007, though I have made some minor edits to clarify dates and fix dead links. With a bit of luck and a bit more organisation, I’ll have something new and exciting for next week.


I recently found out about this not very well known technology (it currently has no Wikipedia page) with so many benefits. It’s titled Secure Erase and has been part of the ATA standard since ATA/ATAPI-4 (1997)1. The general principle was to introduce a hardware implementation of software “shredding” or “wiping” applications.

The concept was investigated and developed by the Centre for Magnetic Recording Research at the University of California, in conjunction with the U.S. Federal Government and hard drive manufacturers2. The major advantages of a hardware implementation would be increased speed and the ability to wipe areas of the disk not made available externally, such as bad sectors. The time taken to perform a secure erase is estimated to be between 10 and 60 minutes3 depending on capacity and speed, with the common software triple overwrite (very often incorrectly referred to as DoD Standard 5220) taking up to eight times longer.

On the topic of the US Department of Defence’s National Industrial Security Program Operating Manual (NISPOM2006-5220), almost universally referred to as a standard requiring that hard drives be wiped using a triple overwrite method, not only is it not a standard, it has only two paragraphs relating to data sanitisation (section 8-301) which make no mention of how a hard drive should be wiped before declassification. As of late June 2007, magnetic “Rigid Disk(s)” can no longer be declassified by overwriting4. The remaining options are degaussing which often destroys the attached controller along with the data making the hard drive completely useless, and destruction, either by incineration, smelting, abrasion or the use of chemicals.

Back on Secure Erase, one of the biggest advantages of implementing the overwrite in hardware is the ability to use a different frequency to write at a different frequency from normal. A very basic method of retrieving even data that has been overwritten is to read at an offset from the track. As time passes with data stored on a disk, the magnetic domains aligned to represent that data affect surrounding domains, causing a spreading effect. Immediately after overwriting, the domains slightly offtrack still represent the old data. The triple overwrite method causes large fluctuations in the ontrack domains with the desired effect being the offtrack ones change at a greater rate. However, writing with a different frequency is akin to writing over a wider area (or alternatively, a localised degaussing effect). It has been found that lower frequencies result in higher signal reduction4 and it is expected that hard drive manufactures will implement Secure Erase in this way.

The actual instructions required to initiate a secure erase are known as Secure Erase Prepare and Secure Erase Unit. Secure Erase Prepare is given first and may be refused, in which case the drive is not able to perform a secure erase at this time (either through lack of authentication or because the BIOS has locked it). When the Secure Erase Prepare has been accepted, Secure Erase Unit can be issued with a parameter indicating the type of erase: normal or enhanced5. A normal erase writes zeros to all user data locations. An enhanced erase, according to (5), writes a user defined pattern. The enhanced erase appears to be left open to manufacturers interpretation and a DC erase combined with an arbitrary patten can (and probably should) be used instead2.

The CMRR has a freeware utility available for performing a normal or enhanced erase (untested, since I have no hard drive I wish to erase right now), along with more reading at their Secure Erase page.

So now you know. If you’re selling/donating/disposing of a hard drive and are worried about what information may be stored on there, don’t bother wasting hours or days with a triple overwrite method or a program claiming to comply with a US standard that isn’t a standard. The erase program is built into the disk and the utility above will let you get to it. Hopefully the option will be exposed in operating systems soon, since the functionality is already widely available.


References:

1. C. E. Stevens, Mass Storage Media Locking, http://t10.t10.org/ftp/t10/document.05/05-438r0.pdf

2. G. Hughes & T. Coughlin, Secure Erase of Disk Drive Data, http://www.tomcoughlin.com/Techpapers/Secure%20Erase%20Article%20for%20IDEMA,%20042502.pdf

3. G. Hughes & T. Coughlin, Technical Proposal on ATA Secure Erase, http://www.t13.org/Documents/UploadedDocuments/docs2004/e04147r0-TechProposalFreezeLockSecureErase.doc

4. Defense Security Service, Updated DSS Clearing and Sanitization Matrix (June 28, 2007) http://www.dss.mil/isp/odaa/documents/clearing_and_sanitization_matrix.pdfhttp://it.ouhsc.edu/policies/documents/infosecurity/DoD_5220.pdf

5. D. Colegrove, Enhanced Security Erase Unit Proposal, http://t13.org/Documents/UploadedDocuments/technical/d96156r0.pdf

January 10, 2010 • Posted in: Uncategorized • Comments Off

Ideal Interfaces

So, last time I discussed a few different approaches to implementing interfaces, though didn’t explicitly state which one is best. (Obviously this is all my opinion. I’m simply saving time and easing the wording by stating everything as fact.)

The first and most essential thing to define is the purpose of an interface: an interface guarantees the provision of a minimum set of behaviours. (Contrast with inheritance: a base class provides a minimum set of behaviours and an interface.)

Obviously, this sort of guarantee has no place in dynamically typed languages. The point of a guarantee is to provide early detection of errors in a simple manner: does this object claim to implement IMyInterface? If so, we know for certain that all the members of IMyInterface are available. Further, we know that any pointer to IMyInterface provides all the members. Otherwise, we can fail immediately.

With duck-typing, there is no need for an early guarantee. Throw in multithreading and you can’t be sure of the object type until you’re attempting to call the method, which is the earliest you could detect an incorrect interface anyway.

So now we’re only looking at statically typed languages and hence, statically typed projects. ‘Toy’ projects, while useful and important, have no need for interfaces. Small projects with only a couple of co-located developers have no need for interfaces. Large projects or distributed projects, however, absolutely require strictly controlled interfaces.

The only reasonable way to subdivide effort on a software project is to identify components and assign developers to each. As soon as you discover components, you need to define interfaces. Any interacting components require an interface, and non-interacting components are usually wasted effort. Add to this that any serious project will have further development done by people other than the original developers, and strict interfaces are the only way to exert any control.

As real life examples, both Windows and Office are huge networks of interacting components. The Windows shell, effectively every Explorer window, view or extension, is COM based. If you want to create a special object for your camera/MP3 player/card reader, create an object implementing IShellFolder. If you want to add a cascading context menu to your file type, implement IContextMenu. Need newer features? Use IContextMenu2 instead.

Of course, the name is not essential. If you want IContextMenu, you actually want 000214e4-0000-0000-c000-000000000046. If you want IContextMenu2, you actually want 000214f4-0000-0000-c000-000000000046. Most importantly, there are no language restrictions. Any language producing an executable file can implement a COM object, which is great when you want or expect other people to develop for your interfaces. In an uncontrolled context, COM interfaces are undeniably the most robust.

The next step up from COM interfaces are .NET interfaces. While it is possible to guarantee that an application running on Windows can use COM, that guarantee does not yet exist for .NET. However, the versioning of interfaces in .NET is far more robust.

In general, most of .NET is more robust than its predecessors, which is why it tends to be the best thing going around at the moment. Looking at introduction dates, it’s easy to see why: C++ came about in the early 1980’s, COM appeared about 1993 and .NET came out in 2002 but was significantly re-done for version 2.0 in 2005. That’s 25 years of research and experience in C++. All the good habits that the best development teams use in C++ have been rolled into the .NET Framework and languages. Mono support is good enough now to make it as must a cross-platform solution as C++ with some other framework.

I’ll save the full Microsoft love-in for a later post, but consider this my final word: the best interfaces about are in the .NET languages.

January 3, 2010 • Tags: , , • Posted in: Uncategorized • Comments Off

Interface Identification

In something of a follow-up to my last discussion, in this post I intend to look at a couple of ways that interfaces are identified, found and versioned.

The simplest implementation of the interface pattern is C++’s (possibly Java’s, but I haven’t used Java enough to speak authoritatively on it). Since C++ doesn’t have interfaces as part of the language, it is still implementing a pattern – a class with all functions being pure-virtual/abstract that is implemented by another class. This approach identifies the interface with a name. If you have the name, you can have the interface.

class IPrinter
{
public:
    virtual void Print() = 0;
};
 
extern IPrinter* GetPrinter();
 
IPrinter* myPrinter = GetPrinter()

I’ve deliberately used an external GetPrinter() method to illustrate the problem. That method lives in a separate module (DLL/so/dylib) and is exported using C++ name mangling (I’m not even going near the issue of different compilers doing that differently). Mangling schemes include the types of all parameters and the return value, so we definitely obtain the correct overload that returns IPrinter*. However, what if the other module looks like this:

class IPrinter
{
public:
    virtual void Open() = 0;
    virtual int Print() = 0;
};
 
IPrinter* GetPrinter()
{
    // return something
}

The external module has a newer version of IPrinter, with an extra method and different return value. Unfortunately, this doesn’t change the mangled name of GetPrinter(), so our linking works correctly, but we get a pointer to a different interface.

This can be solved using naming conventions, for example, IPrinter2/IPrinter3/etc. The problem with conventions is that they are generally not well defined. The naming convention used in COM, however, is quite clear:

Each interface — the immutable contract of a functional group of methods — is referred to at run time with a globally unique interface identifier (IID). This IID … allows a client to ask an object precisely whether it supports the semantics of the interface. [Emphasis mine, source: MSDN]

Again, this is only a convention. However, it is supported by registration, type libraries and other tools that complicate the process of stuffing up. (For those who haven’t ever used COM, don’t get scared off. The C++ support is really quite good – most of the time you can hardly tell that it’s not C++ classes.) The rules are straightforward – if you’ve released an interface into the wild, you can only change it if you change its identifier.

In fact, you generally add new interfaces to your object and never remove old ones, since that way you get good backwards compatibility. A C++ implementation requires a new definition, but an old COM definition is still valid for new implementations, since construction and accessing interfaces is handled by the implementation.

Of course, nobody uses COM any more, right? We’ve all moved on to .NET and C#… right? Interfaces in managed code are first-class citizens, unlike in C++, but they have many levels of robustness for identification.

Due to the detailed metadata made available in managed class libraries, you can import the definition of an interface from the file containing the implementation. When you compile your project, the version of the implementation file is stored and checked at runtime. Mismatched versions will cause a crash.

However, version numbers do not necessarily increment for each build. Rest assured, if you change the members of IPrinter behind the client’s back, they will notice and crash anyway (eventually), but if you change the version number as well the crash will be neater (as in, it provides an explanation of what’s wrong).

Ensuring that version numbers increment automatically on the implementation helps, as long as the client application takes it into account. The reference to the implementation contains the current version number, but also a flag indicating whether that version number is relevant. Disable that, and any version of a file with that name will be used. (This could also be referred to as the “asking for trouble” flag, though there are so many of those around that we’ll call this one “Specific Version” to disambiguate.)

You can go one step further and sign your assembly with a strong name key, which doesn’t guarantee that the contents haven’t been altered, but does guarantee that this is the interface you are looking for. Whether this is better or worse than using a GUID, I’m not 100% sure. It’s certainly more work, so I like to think that it’s better.

An recent interesting approach in the area of interfaces has come out of Google’s Go language. The explanation for C++ programmers is here, but the short story is that while interfaces are defined in the same way, objects are automatically associated with it when they implement the same methods. So the following Printer interface applies automatically to both structures shown:

type Printer interface {
    Open();
    Print() int;
}
 
type BoringOldPrinter struct { printer_handle int }
func (p *BoringOldPrinter) Open() { }
func (p *BoringOldPrinter) Print() int { }
 
type MultiFunctionPrinter struct { printerHandle int; scannerHandle int }
func (p *MultiFunctionPrinter) Open() { }
func (p *MultiFunctionPrinter) Scan() int { }
func (p *MultiFunctionPrinter) Print() int { }

I’ll leave the discussion of Go here and move onto Python, primarily because Go is not yet mature enough to discuss versioning. It appears that at present, changing the implementation requires creating new function names, which is a step backwards from C++. I will wait and see.

Can Python really be said to have interfaces? Probably not. It seems similar to Go (”seems” because I don’t understand Go that well, not because I’m inventing stuff about Python) in that if you know the method you want, any object implementing that method will do. Go at least requires all the methods in the interface to be present. However, neither of these languages provides strict interfaces. Which in Python’s case is good. Strict interfaces add overhead that Python does not need (yet…).

C++ is where huge projects and libraries are done, with C# on the rise and COM on the decline. Google is trying to position Go as a C++-replacement but seem to have missed the importance of strict interfaces, while Python does a great job of filling their niche. Strict and controlled interfaces are important for large, multi-developer projects (and essential when developer communication is limited, such as when someone releases a library). However, since I’ve just hit 1000 words, I’ll leave that discussion for next time.

December 19, 2009 • Posted in: Thinking Out Loud • Comments Off

Better Cohesion

One of the issues that I’ve come across multiple times this year is abstracting multi-purpose hardware. When I say multi-purpose here, I’m talking about custom hardware that has multiple, unrelated, functions. A multi-function printer/scanner/copier (multi-function device, hence, MFD) is a good enough example, so I’ll stick with that throughout this post, even though I’ve never owned or programmed a multi-function printer. (The one limitation of the MFD example is that, generally, you can’t do any of these functions simultaneously. Most of the hardware I worked with absolutely required the separate functions to operate simultaneously.)

Ideally (academically, at least), we would want to abstract our printer/scanner/copier into three classes, Printer, Scanner and Copier. That way, we can give an instance of Printer to the application that’s printing, and it won’t affect some other application that is using an instance of Scanner. Our Printer.Open() and Scanner.Open() functions perform different operations as you’d expect and everyone is happy.

Well, not everyone. The person who has to write these three classes is planning a murderous rampage against whoever decided that cohesion trumps ease, or possibility, of implementation. The problem is that while we have three logical devices, there is only one physical device. All commands given to Printer, Scanner or Copier need to be passed through the same USB connection to the actual hardware.

With our three separate objects, these commands could literally come at any time, which means we need locking on the hardware connection. Inter-process synchronisation is expensive (in the Windows world we’re looking at a mutant/mutex (kernel object) versus a critical section (compare-and-swap operation)). Then there’s the other problem of device sharing between processes which, if solved at all, involves more synchronisation.

The best solution to this comes from COM. Once you get past the horrible syntax (brought about by creating an object model with no language support), you find that COM makes it easy to tie an object instance to a separate process (server) and let other processes access it transparently. However, this isn’t the really juicy bit.

“Cohesion” normally refers to how focused an object is. When separate, the Printer class is dedicated solely to operating the printer function, and hence has high cohesion. Every member of Printer relates to the printer. Every line of code in Printer relates to the printer. If something is going wrong with the printer, look inside Printer.

If we combine all of our Printer, Scanner and Copier classes into a single Device class, instead of Printer.Open() and Scanner.Open(), we have Device.OpenPrinter() and Device.OpenScanner() (or worse, Device.Open(enum DeviceType type)). The interface is now massively fragmented and client-side cohesion has gone out the window. On the bright side, all the hardware communication is now in the one place, making the internal cohesion high.

The fundamental problem is that cohesion from the implementer’s point-of-view is different to cohesion from the user’s point-of-view. The implementer loves having all the synchronisation and driver communication in the one place, while the user doesn’t understand why they need to dig through all these printer methods just to get to Device.MakeColourCopy(1).

So how does COM solve this? In short, by not allowing access to the class. If we create an instance of Device in C++ or C#, we have access to every member. If we create an instance of Device using COM, we get an opaque object pointer until we request an interface. How does this look in C#?

interface IPrinter
{
    void OpenPrinter();
    void ClosePrinter();
    void Print(string text);
}
 
interface IScanner
{
    void OpenScanner();
    void CloseScanner();
    Bitmap Scan();
}
 
class Device : IPrinter, IScanner
{
    public void OpenPrinter() { }  // reordered slightly, because it happens to make
    public void OpenScanner() { }  // more sense to put the "Open"s together
    public void ClosePrinter() { }
    public void CloseScanner() { }
    public void Print(string text) { }
    public Bitmap Scan() { }
}
 
Device device = new Device();
// Only use 'device' to assign to 'printer' and 'scanner'
IPrinter printer = (IPrinter)device;
IScanner scanner = (IScanner)device;

So we create an instance of Device, but the restrict our view of the interface to that part which we are interested in, either IPrinter or IScanner. If instead of Device we instantiate a class specific to our hardware, it doesn’t have to implement IScanner, and anyone trying to use it as a scanner will get a nice little exception.

The one limitation here is that the interface of Device exposes all members, which means we must avoid naming collisions. COM doesn’t have this issue, because it never exposes the entire interface, only the part which has been requested. Luckily, C# can do the same.

interface IPrinter
{
    void Open();
    void Close();
    void Print(string text);
}
 
interface IScanner
{
    void Open();
    void Close();
    Bitmap Scan();
}
 
class Device : IPrinter, IScanner
{
    void IPrinter.Open() { }
    void IScanner.Open() { }
    void IPrinter.Close() { }
    void IScanner.Close() { }
    void IPrinter.Print(string text) { }
    Bitmap IScanner.Scan() { }
}

This is known as explicit interface implementation, and it means that the following are true:

Device device = new Device();
IPrinter printer = (IPrinter)device;
IScanner scanner = (IScanner)device;
 
device.Open();      // compiler error
printer.Open();     // calls IPrinter.Open()
scanner.Open();     // calls IScanner.Open()

So now, the only way to access the printer or scanner is to instantiate a Device and specifically request access to the subset that is required. IPrinter has high cohesion, since it only contains members related to printing, as does IScanner. Meanwhile, a murderous rampage is avoided as the developer is also allowed an implementation with high cohesion.

The Revival

Well, after effectively taking a year off to handle my final year (undergraduate) project, and on the eve of my entry into “grad-land”, I’m making a return to blogging.

And this time, I’m serious.

Well, a bit. I mean, how serious can you be on your third or so attempt?

The plan this time is to really find my voice and stick to a schedule. I’m aiming somewhere in the middle of multi-daily-titbits, semi-annual epics and daily inanity. I’m thinking weekly. Or possibly a weekly “big one” and other little ideas as they come up.

I’m also keen to get some discussion going, which means a combination of re-opening the spam floodgates (my life got so much more pleasant when I just turned commenting off completely) and taking overly-controversial stands on topics in order to provoke people into replying. Personally, I don’t mind whether I have loyal readers or an angry mob. As long as they’re angry, I know they’re reading.

I’ve already got a bit of an idea of some architectural stuff I want to talk about, largely as a result of having spent the year programming in VHDL, C, C++, C++/CLI, C#, VB, Ada and Python and having dealt with many horrible systems (mostly created by myself).

So please, drop a comment here just saying “hi”, so I know that people do exist. With any luck, I’ll have my first post up this weekend.

December 12, 2009 • Posted in: Uncategorized • 4 Comments

Clip Organizer displays "no such interface supported" error

Excuse the Google-bait title, but since the solution to this apparently does not exist anywhere else on the internet, I am keen to make it available. My initial support request is here, and though the initial response left a bit to be desired, by the time I reached "elevated phone support" we all knew what we were talking about.

To summarise the issue, when attempting to insert clip art in any of the Office 2007 products, or when trying to open the Clip Organizer directly, the following error is displayed:

Clip Organizer cannot complete the operation, No such interface supported, Error Code 0x80004002

The only solution to this is reinstalling Windows. Apparently, doing a repair of Vista loses absolutely no settings, though it does remove all updates, including service packs. Not being willing to test this, I was happy to go without clip art.

However, I have since figured out how to fix the issue without reinstalling. The solution has proven to me that reinstalling is the only reasonable approach for most people, so for their sakes I hope it really doesn’t message anything up. For the more technical of us, there is another way.

At this page, there is a list of all the files included in the Microsoft Data Access Components (MDAC) 2.5 stack, including their directories (all under %ProgramFiles%\Common Files). MDAC 2.5 is not the most recent version, MDAC 2.8 is, but it was the only list of files I could find that included their folders. (The release manifest for MDAC 2.8 has all the files.)

To make things harder, MDAC 2.8 isn’t the most recent – Windows Vista comes with Windows Data Access Components (WDAC), which are not available as a separate download (nor any list of files and locations that I could find). They are not included in Office 2007, but are simply assumed, which in my opinion is a bad thing when the files can (apparently) be deleted as easily as I (allegedly) managed it (keeping in mind that I run as a limited user, not merely a UAC encumbered administrator).

Luckily, thanks to Vista’s new approach to the servicing stack, the files aren’t really stored in %ProgramFiles%\Common Files. They are actually hidden away in %SystemRoot%\WinSxS (%SystemRoot% being the Windows directory, for example, C:\Windows). Unfortunately, so is everything else.

WinSxS, 44139 files, 11180 folders

My fully up-to-date installation of Vista Ultimate 32-bit has 11,180 folders and 44,139 files in %SystemRoot%\WinSxS. Here is where the list of files in MDAC comes in. Luckily, few of the filenames have changed. There are a few new files, easily identifiable through version numbers in the filename, which are stored in the same folder as the earlier versions. So, making use of Vista’s search feature (much improved over XP’s dog), it was possible to copy the files into their rightful places.

Most of these files are COM objects – the "no such interface supported" message is referring to COM interfaces (as far as I can tell, the interfaces in msdaps.dll, being IDBCreateSession, IDBDataSourceAdmin, IDBInitialize and IDBProperties). Some of the files need to be registered with regsvr32.exe, however, I spent zero time figuring this out and just registered everything (which is the easiest way to figure out what needs registering anyway).

Suddenly, clip art worked and they all lived happily ever after.

The End.

January 17, 2009 • Posted in: Uncategorized • 2 Comments