VSPy #6 – Over Too Soon…

So, it’s been fun while it lasted, but I’m calling off this series early. Why? Because VS2010 is almost out, and the extension framework has changed considerably. I missed it when it was first posted about 10 months ago, but Terry Clancy has provided a (long) summary of the new developments in this area (such as removing the package load key we messed around with last time).

So my plan now is to really dig into the release candidate of VS2010 and the SDK and reappear later (hopefully soon) with a new series on integrating Python into Visual Studio. Some of the further developments I’ve made with VSPy that didn’t make it into this series is quite nice IronPython integration for dependency mapping and Intellisense purposes, as well as almost having debugging support (and an option to use CPython 2.6 or CPython 3.0, if you prefer).

Depending on how VS2010 works out, it is likely that I will declare most of that code dead and possibly release the entire thing for anyone who desperately wants to use or develop it. I am quite keen to start fresh, now that I have a better understanding of what is required, and will hopefully return with a much more robust addin for VS2010 (which will eventually end up on Codeplex or similar, obviously).

So, this blog will be making a return to regular programming (whatever that is) for some time. See you next week!

March 28, 2010 • Tags: , , • Posted in: Uncategorized • Comments Off

VSPy #5 – Deployment

I’ve done some reshuffling of the original post order, since I’ve discovered a much more accurate way of handling name completion but haven’t yet got it working properly. Also, since we’ve got enough highlighting going on, deployment becomes more of a concern, so I’ve bumped that up, then I’ll cover adding a Python Project type and being able to run it.

This is the fifth post in a series adding Python support to Visual Studio. See the table of contents for a brief overview.


Throughout development so far, you’ve been able to run VSPy from Visual Studio in an “Experimental Hive,” isolated from your main installation. But if you want to use it all the time, or share it with others, what we really want is a setup program.

There are no shortage of installer generators, and while NSIS has always previously been my preference, the Setup Project in Visual Studio (Professional, possibly Standard editions, but I don’t think it’s in the Express editions) is generally more convenient. WiX is the high-powered/low-level version of a Setup Project.

For this, we’ll be using a Setup Project. With your VSPy project open, create a new Setup Project (under Other Project Types, Setup and Deployment) and select Add to Solution.

Add project dialog box

The first view that appears is the File System, which lets you specify the location on disk to install your files to. The easiest option here is to add your project output to the Application Folder folder. First, however, we need to add the project output to the setup project (right-click, Add, Project Output, select VSPy and click OK). A whole selection of dependencies will be detected, none of which are required (since Visual Studio is required, we can safely assume all of them) so they can all be excluded except for the .NET Framework (in Solution Explorer, select all dependencies, right-click, “Exclude”).

Add project output

By default, the project output will be installed to a user-selected directory under Program Files. Alternatively, it is permissible to install to either Visual Studio’s PrivateAssemblies folder (%ProgramFiles%\Microsoft Visual Studio x.00\Common7\IDE\PrivateAssemblies) or the global assembly cache (GAC). Installing to PrivateAssemblies is virtually identical to using a separate application folder (assuming an Administrator is doing the installation) but the GAC requires a few changes to the rest of the process. For now, I’ll assume that an application folder is being used and (may) cover the required modifications later.

The next step is to produce the required registry entries. So far, the RegPkg utility has been called automatically to set the registry of the isolated IDE, but it can also be used to produce a plain-old .REG file:

RegPkg.exe /regfile:VSPy.reg /codebase "{full path to VSPy.dll}"

This registry file is not perfect, specifically, it has two hardcoded paths that need to be changed to be resolved at install time. However, it is good enough to import into out installer. Right-click the Setup Project in Solution Explorer and select “View, Registry”. HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER have some standard keys which we don’t need, so feel free to delete them. Right-click on “Registry on Target Machine”, select Import and open the .REG file created earlier.

Registry settings for VSPy

In the screenshot above I have already made the two modifications required: The CodeBase value contains the path to our class library, which will be installed to a user-customisable folder. The actual folder selected will be in the [TARGETDIR] property.

Secondly, the InprocServer32 value originally has a full path (typically C:\Windows\System32\MSCOREE.DLL) that should be changed to use the [SystemFolder] property instead (because Windows doesn’t have to be on the C drive).

(According to the MSDN documents, some further steps need to be included. However, I haven’t included these and everything appears to work fine. Possibly these are required for more complicated add-ins or for add-ins that aren’t automatically loaded as language services are.)

Pull up project properties and set your company name, etc, and now everything else should be fine, so go ahead, build and install it!

Oh, you’re back? See something like this? If not on your own machine, then on somebody else’s? (To reproduce it, you can start devenv.exe with the “/novsip” option.)

Package load error

Yes, I did forget one little thing. Visual Studio requires packages to have a load key. A load key ties together the package name, version, GUID and target version of VS to ensure that packages are only run on platforms that support it. This prevents people from complaining when your package for VS2008 makes VS2005 crash, because VS2005 will just refuse to load it.

You can obtain a load key for free from this page at the Visual Studio Extensibility centre. You need to provide some information about your package, but as far as I can tell there is nothing personally identifiable included. The GUID is that of the VSPyPackage class.

The load key that is given back is a string of alphabetic characters. Our VSPyPackage already includes a ProvideLoadKey attribute with resource identifer 1, so open up Resources.resx and add a string resource 1 with the value set to the load key (you may need to remove newline characters in Notepad first). The ProvideLoadKey attribute also specifies information that must precisely match what you enter into the load key request form.

Rebuild everything and you can now happily deploy your syntax highlighter to another machine.

March 14, 2010 • Tags: , , , • Posted in: Uncategorized • Comments Off

VSPy #4 – Bracket Matching

This is the fourth post in a series adding Python support to Visual Studio. See the table of contents for a brief overview.


A nice simple task for today. One of the most common and useful features in today’s IDEs is bracket matching. In Python it is somewhat less important than indent guides (generally vertical lines appearing every four characters of whitespace, see Notepad2 or Notepad++ for example), but they are so trivial to add in that it can’t hurt.

Indicating brackets to be matched involves activating matching in the package options, passing a list of matching spans to the AuthoringSink class and responding to a request for token information with TokenTriggers.MatchBraces. Luckily for us, bracket matching is enabled by default and the other two are already nicely wrapped up for us.

These two lines are from the Configuration class I’ve provided earlier:

ColorToken((int)'(', TokenType.Delimiter, TokenColor.Text, TokenTriggers.MatchBraces);
ColorToken((int)')', TokenType.Delimiter, TokenColor.Text, TokenTriggers.MatchBraces);

We’ve already discussed most of these values, but that last one is now interesting to us. Most token types have no triggers, that is, they are just text. However, some tokens automatically activate member lists or parameter information. This activation is handled by specifying triggers.

The MatchBraces trigger tells Visual Studio to search through its list of matching text spans to find the partner to the current one (VS has obviously come from a C#/C++ background, but MatchBraces can be used to match anything). Currently, only parentheses and braces are handled, so you will need to add two more similar lines for square brackets.

Finally, the AuthoringSink needs to be informed of the matching spans. This is part of the parsing process, and the Parser class (which is mostly automatically generated) has a Sink member that returns the current AuthoringSink (provided by the IDE). The provided Parser, however, already has bracket matching all wrapped up.

The parser.y file includes a method in the preamble simply called Match(). Some parameter conversion and a pass-through to DefineMatch() later and all the information about our brackets is ready. All that we need to do is add calls to Match() into our parser.

This is pretty straightforward. As with the scanner, each line of the parser can include some C# code to perform specific handling. Anywhere where brackets of any type appear, we need to add a call to Match() with the location information. Included below is the updated DefStatement rule.

DefStatement
    : KW_DEF NAME ':' Block
    | KW_DEF NAME '(' ')' ':' Block                                         { Match(@3, @4); }
    | KW_DEF NAME '(' ParameterList ')' ':' Block                           { Match(@3, @5); }
    | KW_DEF NAME '(' ParameterOverflow ')' ':' Block                       { Match(@3, @5); }
    | KW_DEF NAME '(' ParameterList ',' ParameterOverflow ')' ':' Block     { Match(@3, @7); }
    ;

The @<number> notation is converted to a LexLocation object containing all the location information about the token identified by the number. So @1 for any of the rules of DefStatement will return the span containing “def”. (Not shown here, the $<number> notation is converted to a LexValue object with the text of the token, though whether it works at all depends very much on the rule it is applied to.)

Add similar calls to Match to all the relevant production rules and we’re done! Rebuild and open up a file and highlight some brackets. (BIG warning: If the file you open has incorrect syntax, the parser will probably fail and no brackets will be matched. Also, there is a subtle bug in the scanner that will put the right-hand bracket in the wrong place when the brackets are split over multiple lines. Good luck!)

March 7, 2010 • Posted in: Uncategorized • 1 Comment

VSPy #3 – Hierarchical Parsing

This is the third post in a series adding Python support to Visual Studio. See the table of contents for a brief overview.


As of last time, we had a working syntax highlighter for Python. While it only contained a few hardcoded keywords, it’s possible to see how this takes shape. For handling the full set of keywords, as well as supporting future metadata (such as a different display name or help text for Intellisense), it is worth moving the keyword definitions into a separate file and loading them. I’ve already done this work, and the link is hidden somewhere inside this post, so you’ll have to read the whole thing to get it!

The aim of this post is to improve our parser. What has already been provided is a bare minimum to ensure some level of sensibility. However, since we have plans to include name completion, member lists and some sort of hints about variable values, we need a better understanding of the code structure. While we’re here, we’ll also add support for implicit and explicit line continuation and semicolon separators.

Python’s structure is indicated by whitespace. Where C-style languages use ‘{‘ and ‘}’ to indicate where a block begins and ends, Python’s blocks are simply indented deeper than the previous and next statements. The main issues to be faced here are determining the level of indent on a particular line and calculating how much it has changed from the previous line.

First, we will define some new tokens. In the parser file, add this line:

%token INDENT UNINDENT

This defines two new tokens for our scanner to return. INDENT represents when there is more whitespace at the start of the line than on the previous line, and UNINDENT represents when there is less whitespace. Adding these rules into the lexer is not trivial, but still reasonably straightforward.

First, we need to define some instance variables for tracking our indent level. Instance variables/methods are written between the first pair of %{ and %}.

Stack Indents = new Stack();
int nextIndent = 0;

The stack will be used because we need to keep track of how many indents were used. Otherwise, it will be impossible to unindent by more than one level. nextIndent is also used for unindenting by more than one level, as we’re about to see.

In the next %{ %} block, add the following code:

if (lineStartNum == cNum &amp;&amp; (chr != ' ' &amp;&amp; chr != '\t' &amp;&amp; chr != '\r' &amp;&amp; chr != '\n' &amp;&amp; chr != '#'))
{
    nextIndent = 0;
}
if (Indents.Count &gt; 0 &amp;&amp; nextIndent &lt; Indents.Peek())
{
    Indents.Pop();
    return (int)Tokens.UNINDENT;
}

The first line handles a case that we can’t write into the regular expression: if there is no whitespace at the start of the line (indent of zero). (The extra conditions are included because blank lines and lines with comments only are not relevant to the indent level.) If the line has no indentation, we reset nextIndent to zero.

The second if statement checks whether we are currently unindenting. Since our scanner can only return one token at a time, this mechanism lets us continue to return UNINDENT until the stack is at the right level.

The last section of code is a regular expression followed by a significant block of code:

^{WS}+/[^\r\n#]             {
                                nextIndent = yytext.Length;
                                if (Indents.Count == 0 || nextIndent &gt; Indents.Peek())
                                {
                                    Indents.Push(nextIndent);
                                    return (int)Tokens.INDENT;
                                }
                                else if (nextIndent &lt; Indents.Peek())
                                {
                                    Indents.Pop();
                                    return (int)Tokens.UNINDENT;
                                }
                            }

{WS} represents a single whitespace character, and the expression matches at least one of these at the start of the line and followed by a single character that isn’t a newline or comment. This final character is not included in the match.

The handler is simply a case of setting nextIndent to the length of the matched whitespace and either returning an INDENT token or an UNINDENT token. Multiple unindents will be handled by the code earlier (multiple indents don’t exist).

You’ll find now that parsing of files fails spectacularly, since the parser does not ever expect to see an INDENT or UNINDENT (and it won’t see an UNINDENT, because it will give up at the first INDENT). Currently, our compound statements (if, else, while, etc.) are single line only, for example:

IfStatement: KW_IF Expression ':'

What we want is rather than an if statement being followed by a newline and then just another statement, we require a newline, then an INDENT, some number of statements followed by an UNINDENT. The easiest way to achieve this is to create a Block rule:

StatementBlock: UNINDENT | Statement StatementBlock;
Block: EOL INDENT StatementBlock;

StatementBlock contains multiple statements terminated by an UNINDENT, and Block starts with a newline, followed by an indent, statements and unindent. The change to most of the compound statements is now fairly trivial:

IfStatement: KW_IF Expression ':' Block

But Python also supports a single statement on the same line. Luckily, with a small change to Block, so can we:

Block: EOL INDENT StatementBlock | Statement;

Line continuations are easy, compared to handling indentation. Explicit line continuations (where the line ends with a backslash) are caught and ignored by adding the following line to the lexer (the second line was already there and is shown as a comparison).

\\{WS}*{EOL} { }
{EOL} { return (int)Tokens.EOL; }

Some implicit continuations can also be handled in this way. For example, a line ending in a period or comma is assumed to continue, so the line added above becomes:

[.,\\]{WS}*{EOL} { }
Apart from the code block not displaying properly anyway, Python doesn’t actually allow period and comma implicit continuations.

The trickier implicit continuation is those inside some form of brackets, such as parentheses for a parameter or argument list, square brackets for a list or braces for a dictionary definition. To handle these correctly we need to know when we are in a set of brackets and correctly handle when we get out of them.

As with keeping track of indents, we simply add another stack to the scanner. This time, we’ll keep the character that closes the brackets as the value, to simplify things later.

Stack Nesting = new Stack();

Now our lexer code for scanning brackets has to change. (Note that this example doesn’t check that the closing bracket matches the opening bracket. The parser will also check this, but it is easier to provide a useful error message for this in the scanner. The implementation of such an error message is left as an exercise to the reader.)

\( { Nesting.Push(')'); return (int)'('; }
\) { Nesting.Pop(); return (int)')'; }
\[ { Nesting.Push(']'); return (int)'['; }
\] { Nesting.Pop(); return (int)']'; }
\{ { Nesting.Push('}'); return (int)'{'; }
\} { Nesting.Pop(); return (int)'}'; }

We update the newline scanner from above to make sure we don’t pass through any EOL tokens when we are inside brackets:

{EOL} { if (Nesting.Count == 0) return (int)Tokens.EOL; }

And finally, we add if (Nesting.Count == 0) blocks around all the indentation code we added earlier. Otherwise, the unimportant indents that occur “on a single line” will be detected and everything will break (trust me, it breaks).

So for everyone following along at home who didn’t bother to write any of this themselves (and for those who skipped down to the bottom to find the link), this version of the language service files can be found here. (Note that Keywords.cs and KeywordData.xml weren’t in the last one – they handle highlighting all of Python’s keywords [note that Python doesn't consider None, True, False, etc. to be keywords, but add them if you like]. KeywordData.xml needs to be added to the resource file, as the build error will tell you when you don’t do this.)

February 28, 2010 • Posted in: Uncategorized • 1 Comment

Goto Design Pattern #3

It’s been a while since my first post on this topic, where I presented a genuine situation where goto is the most reasonable solution. I then posted a follow-up showing a real-life example (well, for those of us using the Win32 API from assembly language). There was some discussion and alternate suggestions involving massively generic (no, not an oxymoron) object models. Unfortunately, due to the server changeover, that has been lost (the links above go to static pages with my original posts).

Well, now the team at Microsoft responsible for the Parallel Extensions to .NET 4.0 have come out with their own solution (yes, this link is the whole point of the post, so follow it. I’ll wait).

While I don’t agree that the solution is any improvement over my one, they’ve certainly given the issue much better coverage than I did. However, have a look at their second last code snippet and ask yourself whether you’re more likely to get it correct than my version:

bool MakeSandwich()
{
    ret = GetBreadFromFridge();
    if (!ret) goto Abort;
 
    ret = GetHamFromFridge();
    if (!ret) goto ReturnBread;
 
    ret = AssembleSandwich();
    if (!ret) goto ReturnHam;
 
    return true;
 
ReturnHam:
    ReturnHamToFridge();
ReturnBread:
    ReturnBreadToFridge();
Abort:
    return false;
}
February 21, 2010 • Tags: , , • Posted in: Uncategorized • Comments Off

VSPy #2 – Syntax highlighting

This is the second post in a series adding Python support to Visual Studio. See the table of contents for a brief overview.


The syntax highlighting mechanism in Visual Studio is based on the IVsColorizer interface and its ColorizeLine method. Language services provide an object that implements this interface to the IDE, which uses it to obtain colour information. The default implementation uses an object implementing IScanner with the aptly named function, ScanTokenAndProvideInfoAboutIt. Managed Babel’s implementation of this interface is called LineScanner.

LineScanner is a very simple wrapper that connects up a couple of other classes. The Scanner class is automatically generated from lexer.lex (discussed below), where Scanner.GetNext() finds the next token/keyword and returns its type and location. The Configuration class contains definitions for colourable items, linking the token identifier with the colour it should be displayed as.

To make this slightly clearer, a few lines from the Configuration.cs file provided last time are shown. These associate three different tokens with a particular type, colour and trigger (we will see triggers later, but just ignore it for now).

ColorToken((int)Tokens.KW_IF, TokenType.Keyword, TokenColor.Keyword, TokenTriggers.None);
ColorToken((int)Tokens.KW_ELIF, TokenType.Keyword, TokenColor.Keyword, TokenTriggers.None);
ColorToken((int)Tokens.KW_ELSE, TokenType.Keyword, TokenColor.Keyword, TokenTriggers.None);

For a language service, the most important part is the scanner/parser pair. The parser file (parser.y) defines the members of the Tokens enumeration, the scanner (lexer.lex) describes how to identify a token from the plain text, and then the parser determines whether the token is in a logical place. This back-and-forth is not very pleasant (even worse is that Tokens needs to be cast to int effectively everywhere) but it is manageable. The provided generators make life a bit easier.

In part one I provided a simplified parser/lexer implementation. This parser should be sufficient for providing syntax highlighting, but it does not perform anything approaching syntax validation. For now, however, we’re going to look at the lexer.

Managed Babel is based in C#, so rather than using any of the “standard” lexer generators (such as flex), the MPLex tool (provided with the SDK) is used. The syntax is very similar, however, all code sections are written in C# rather than C++, as the generated scanner is in C#.

Ignoring the code sections for now, let’s look at the first few lexer rules:

#.*$                        { return (int)Tokens.LEX_COMMENT; }
{EOL}                       { return (int)Tokens.EOL; }
{WS}+                       { }
[a-zA-Z_][a-zA-Z0-9_]*      { return CheckName(yytext); }

Each rule is a regular expression followed by a snippet of C# code. This code either returns the value of the token or does nothing.

The first rule matches a hash symbol, followed by anything up until the end of the line. In Python, this represents a comment, so we return Tokens.LEX_COMMENT (I don’t see the need for the LEX_ prefix either, but that’s how it came).

The second rule matches any standard line ending and returns an end of line token. ({EOL} has been defined earlier in the file as (\n|\r\n?).)

The third rule matches any whitespace but consumes it silently. Python’s structure is determined by indentation, which means that eating whitespace silently is a good way to make it really hard to validate structure. For now, however, we only care about highlighting.

The fourth rule is a very general purpose rule. A letter (or an underscore) followed by any other letters, digits or underscores represents most of the Python language. All keywords, built in functions and variables, as well as any user defined variables will match this rule. The performance of the scanner generated by MPLex degrades with size, so defining individual rules for all possible names is not a good idea. Instead, once we’ve identified a name, we can pass the text to a method defined earlier in the lexer file.

internal int CheckName(string name)
{
    switch(name)
    {
    case "if":
        return (int)Tokens.KW_IF;
    case "elif":
        return (int)Tokens.KW_ELIF;
    case "else":
        return (int)Tokens.KW_ELSE;
    case "while":
        return (int)Tokens.KW_WHILE;
    case "for":
        return (int)Tokens.KW_FOR;
    case "class":
        return (int)Tokens.KW_CLASS;
    case "def":
        return (int)Tokens.KW_DEF;
    case "return":
        return (int)Tokens.KW_RETURN;
    default:
        return (int)Tokens.NAME;
    }
}

The CheckName method shown above is what is currently provided. Rather than defining all the matching rules as regular expressions, we can perform straight string comparisons to determine what keyword has been found. As a fallback, we can assume that the token is just a user variable.

While this may be more efficient than defining rules, it is not very pleasant to write, and any performance optimisations (such as grouping by length or the first character) will only make readability worse, particularly as the list of tokens gets longer. The alternative that I have used (which will appear in a later version of the lexer) is to create a static Dictionary object which contains all the keywords. The “new” version of the above CheckName method looks like this:

int CheckName(string name)
{
    Tokens t;
    if (Keywords.TryGetValue(name, out t)) return (int)t;
    return (int)Tokens.NAME;
}
 
// Using braces instead of angle brackets for the generics, since WP can't show them
 
static Dictionary{string, Tokens} Keywords = new Dictionary{string, Tokens}
{
    { "if", Tokens.KW_IF }, { "elif", Tokens.KW_ELIF }, { "else", Tokens.KW_ELSE },
    { "while", Tokens.KW_WHILE }, { "for", Tokens.KW_FOR },
    { "class", Tokens.KW_CLASS }, { "def", Tokens.KW_DEF },
    { "return", Tokens.KW_RETURN }
};

This is easily expanded to handle all of Python’s keywords, as well as built-in methods and variables.

Meanwhile, over in the parser, you can see the complete definition. Even though the scanner doesn’t identify most keywords, they are still included in the language grammar.

Reading a BNF grammar is best done from the top, which is effectively how the parser does it as well.

PythonFile
    : EOF
    | Statement PythonFile
    ;
 
Statement
    : DefStatement | ClassStatement | ImportStatement EOL
    | IfStatement | ElifStatement | ElseStatement
    | ForStatement | WhileStatement | TryStatement | ExceptStatement | FinallyStatement
    | PrintStatement EOL | PassStatement EOL | ReturnStatement EOL
    | ExecStatement EOL | AssertStatement EOL | RaiseStatement EOL
    | GlobalStatement EOL | YieldStatement EOL | BreakStatement EOL
    | ContinueStatement EOL | DelStatement EOL
    | Expression EOL | Assignment EOL
    | EOL
    ;

The first element is PythonFile. All valid files must match this rule, which means that all files must either be at the end of the file, or they contain a single Statement followed by a PythonFile. BNF rules are very often recursive in this manner. It allows an arbitrary amount of statements terminated by an EOF, which is the only rule in PythonFile that doesn’t recurse.

Each statement must match the Statement rule. This is quite a massive list of the different statements that Python includes. You will notice that some statements require an EOL immediately following while some others don’t. Those that don’t are terminated with a colon, allowing another statement on the same line if desired. (There is currently no support for handling the use of the semicolon as a command separator, which is probably best implemented by broadening the definition of EOL in the scanner, but since we only care about highlighting at this stage, it is irrelevant.)

IfStatement : KW_IF Expression ':';
ElifStatement : KW_ELIF Expression ':';
ElseStatement : KW_ELSE ':';
ForStatement : KW_FOR NAME KW_IN Expression ':';

Skipping ahead, we see four relatively simple definitions. An if statement matches the IfStatement rule if it the scanner returns KW_IF, followed by any expression (see the rest of the parser file for Expression – it’s scary) and then a colon character. The rules are very similar for elif, else and for.

You may notice that these last four rules don’t enforce that the following line must be indented. Since all whitespace is ignored, it is actually impossible at this stage to detect indents. For syntax highlighting, this is not a problem: a keyword is still a keyword regardless of scope. However, when we come to generating lists of members, we will want to know the scope of declarations. To support this, I had to do some serious modification to the scanner template, which I will go through in a later post (unless by that stage I have completely rewritten the scanner, which I am quite seriously considering…).

February 14, 2010 • Tags: , , • Posted in: Uncategorized • 2 Comments

VSPy #1 – A basic language service

This is the first post in a series adding Python support to Visual Studio. See the table of contents for a brief overview.


The very first step is to install Visual Studio 2008 Standard or Professional (the Express editions are not suitable) and the SDK. (I assume everyone following along is capable of doing this without specific instructions.) When I refer to the install directory of the SDK, I will use $(VSSDK90Install), which is the same macro that the SDK sets inside VS.

Once you’re all set up, you’ll find a set of new project options. Under “Other Project Types\Extensibility”, we want to start a new “Visual Studio Integration Package”. The wizard allows you to select your language (I’ll be using C#) and to specify some details.

Visual Studio Integration Package wizard step 2

The following steps allow you to automatically generate various functionality, none of which is essential. Personally I also uncheck the sets of tests that are available. This generates an empty package, which, when executed, will start an isolated instance of VS with your package loaded. (This “experimental hive” stores all settings in a different location from your main instance, so the first execution will need to be as administrator.)

Before we get too far into coding, Microsoft provide a handy set of base classes known as Managed Babel. These can be found in $(VSSDK90Install)VisualStudioIntegration\Common\Source\CSharp\Babel. Add all of these files to a Babel subfolder in your project, except for Package.cs (some of the later tasks will make it easier to use a central package file, rather than the Babel one). You’ll also need references to Microsoft.VisualStudio.Package.LanguageService.9.0.dll and Microsoft.VisualStudio.TextManager.Interop.8.0.dll.

Into another subfolder (I called it LanguageService), add a new class Babel.LanguageService that inherits from Babel.BabelLanguageService, give it a GuidAttribute and implement GetFormatFilterList(). This method returns a string containing the set of files that this language service should handle. For Python, “Python File (*.py)\n*.py” is sufficient (note the newline termination, rather than the vertical bar used by the common dialog controls).

Before we can get even a basic build working, we need to provide a scanner and parser for Python. My versions for this step are here. Extract the files into your LanguageService folder and add them to your project by manually editing the csproj file and adding the following element:

<ItemGroup>
    <MPLexCompile Include=”LanguageService\lexer.lex” />
    <MPPGCompile Include=”LanguageService\parser.y” />
    <Compile Include=”LanguageService\Configuration.cs” />
    <Compile Include=”LanguageService\ErrorHandler.cs” />
    <Compile Include=”LanguageService\LexDefs.cs” />
    <Compile Include=”LanguageService\LanguageService.cs” />
    <Compile Include=”LanguageService\Resolver.cs” />
</ItemGroup>

The first two children associate the grammar files with the lexer and parser generators, while the rest of the files are added normally. (The next post will have a fuller explanation of these files. For now, we are simply getting the package started. Both the scanner and parser have some minor issues that will be fixed)

At this point, we should be able to build successfully. However, we are not ready to begin testing. The basic language service is available but is not yet exposed through the package. Exposing a language service is a case of setting the following attributes:

[ProvideService(typeof(Babel.LanguageService))]
[ProvideLanguageExtension(typeof(Babel.LanguageService), ".py")]
[ProvideLanguageService(typeof(Babel.LanguageService), "Python", 0, RequestStockColors=true)]

Edit: Added RequestStockColors flag and updated scanner/parser download above.

and adding the following code to the already-overridden Initialize method:

var serviceContainer = this as IServiceContainer;
var langService = new Babel.LanguageService();
langService.SetSite(this);
serviceContainer.AddService(typeof(Babel.LanguageService), langService, true);

At this stage, it is simply a copy-paste job. In time I will cover the purpose of these sections more thoroughly, but for now we should have a working language service that highlights eight keywords (listed near the start of lexer.lex), strings and numbers in a Python file. You’ll need your own test file to open, since we don’t have any support for starting a new .py file (yet).

Next post I’ll go through this version of the lexer and parser, point out some of the shortcuts and shortcomings and finish highlighting the rest of the keywords (in a more efficient way than what I have done already).

February 7, 2010 • Tags: , • Posted in: Uncategorized • 1 Comment

Adding new languages to Visual Studio

Recently I’ve been playing with the Visual Studio 2008 SDK. This SDK supports a lot of functionality, even allowing you to create and distribute independent applications based on the VS2008 IDE (titled “isolated” mode).

One of the primary use cases for the SDK is implementing domain-specific languages. An example of one of these is the class diagram view, which shows a visual representation that is converted to plain code (I personally use this view for documentation purposes only, but it is possible to add elements and relationship to classes using it). However, the use that has my interest is supporting new languages, specifically, Python.

VS2008 supports at least ten different editors (VB, C#, C++, binary, HTML, resource files, XML, XSLT, XSD, XAML) each of which has its own syntax validation and highlighting requirements. Providing this support comes through language services. Depending on installation settings, a variety of project types and languages are available. These come from templates, project nodes and factories. Debugging support is provided for native and managed code by debugging engines.

To fully implement a new language in this context, quite a large amount of work is required. At this stage, I’ve done about half of what I intend, so it’s time to start writing blog entries. This entry will be the contents page with links to each other page, once they are written.

My tentative plan is:

  1. A basic language service
  2. Syntax highlighting
  3. Hierarchical parsing
  4. Bracket matching
  5. Deployment
  6. New projects
  7. Project properties
  8. Start the project (without debugging)
  9. Name completion
  10. Parameter information
  11. Start the project (with debugging)
  12. Breakpoints

As I have only actually completed seven of these (the seven easiest, not the first seven) and this project isn’t my highest priority, there may not be one post per week. If I miss a week, I’ll try and rant about something to fill the space.

February 7, 2010 • Tags: , , • Posted in: Uncategorized • 1 Comment

How To Use Design Patterns Correctly

Short answer: as comments.

The standard definition of a design pattern is “a general reusable solution to a commonly occurring problem in software design,” but a better definition is “a good title with a useful description and a pile of other rubbish.”

Take “Singleton” for example. A singleton is a class that has one instance which is used for all references to that class. That’s all you need. The noun, singleton, and the definition. Everything else simply gets in the way of programming. Singletons don’t need diagrams, books or university subjects. The one line definition is sufficient.

What about “Factory”? A factory is a way to create an object that isn’t a constructor. “Constructor” is an instance method that is always called during its object’s instantiation. And here’s the example that makes everything clear.

“Constructor” isn’t a design pattern anymore. It’s been absorbed so thoroughly into development languages and tools that it’s simply a word with a definition. If you mention “constructor” to someone who knows what it means, they will understand. They don’t need to look up their notes from university to find a class diagram explaining it. If you mention “singleton” or “factory” to the same person, they should be able to understand what you mean without using a reference.

(Oh wait, what about the different type of factories? You mean like default constructors, copy constructors, move constructors [coming soon to a C++0x near you] and static constructors?)

Design patterns provide a common vocabulary and occasionally an example. Which is exactly what you find if you look up constructors on MSDN.

Elevating design patterns above the level given to classes, constructors, operators and other language concepts leads to such unproductive ideas as class MySingleton or worse, template<typename T> class Singleton. Anything more than a brief // VectorCreator is a factory for Vector objects is just unnecessary.

January 24, 2010 • Tags: , , • Posted in: Thinking Out Loud • Comments Off

Teaching Good Interface Design

We all know just how hard it is to design good interfaces. Conflicting issues of performance, extensibility, readability and abstraction all influence the final design, and poor decisions early on can continue to cause issues long into the future (for example, the Windows API). The importance of finding a suitable balance is often understated, in favour of drawing diagrams that simplify the interfaces to a solid line.

So where does interface design fit into the teaching process? Software engineering typically involves learning the mechanics of programming languages, architecture design and implementation paradigms. The end result is a C++ programmer who can invent a bunch of components and implement them as classes, complete with getter and setter methods on every instance variable.

That last comment makes it seem obvious that interface design belongs with implementation paradigms. That, however, doesn’t work. Interface design is not unique to object-oriented programming (which is [still!] unjustifiably dominant [from an education perspective]) nor any other paradigm. Interface design is unique to anything that has humans involved, which, for the foreseeable future, includes all programming methodologies.

So what about architecture design? An area currently filled with various patterns and methods for finding objects from requirements and determining who is allowed to talk to who. Much like arranging seats at a dinner party to try and prevent fights. Except, at a dinner party, you know to introduce yourself, wait politely for the other person to finish talking to someone else and avoid asking about whether their nasty… err… infection has cleared up. Seating plans leave this out because its too low level – much the same reason that architecture design leaves it out (in favour of fancy drawings, usually).

Therefore, by process of elimination, interface design belongs with learning the language. Except that interfaces make no sense without some knowledge of architecture. And the implementation paradigm will determine the mechanics of the interface. Clearly, designing a usable, understandable and performant interface is a subject area in itself.

Is it worth a whole subject (24 hours of lecture, 12-24 hours of lab, 3 hour of exam)? What material would be included? How would the assessment of such highly subjective and speculative designs be valid?

Interface design isn’t so much a case of “do it this way”. The approach required is to ensure students understand the implications of their interfaces before the complaints start flowing in. Plenty of examples and alternatives for real designs.

Windowing toolkits (Windows Forms, MFC, wxWidgets, Qt, etc.) are ideal examples for basically every aspect of interface design. They are always packed with far too many classes, and most provide plenty of other facilities that go well beyond the scope of a “windowing” toolkit. The underlying operating system has a window model that is abstracted by the toolkit, providing ample discussion of suitable levels of abstraction. Source code generators can be advantaged by consistent method signatures that might be counter-intuitive to human developers. Layout managers depend on polymorphism, but not every control has a position or size.

Math libraries (Intel MKL, ACML, etc.) provide plenty of examples of single-instruction-multiple-data support and cases where passing arrays to methods performs far more efficiently than single values. Compatibility between the math library and the code using the results can be heavily influenced by the design of library-specific data types.

As for assessment, a subject like this demands what are effectively comprehension tests. Provide a code snippet (C++ header file) or API documentation and a set of questions (eg. “Which method will make a button invisible?” “Should Mutex::Release() be called immediately before Mutex::Destroy()?”). Include too many questions to be answered in the time (for most people) to encourage speed. Run at least two of these during the subject (including the exam) or run a 5-minute one each week. In such a short time ten questions would be enough.

Throw in a design test to make a simpler wrapper for a provided class. Hand the submissions out the next week as a quick comprehension task and then as a longer analysis task – get each student to analyse (anonymously) someone else’s interface and rate it. Reward interfaces that are easily and quickly understood, penalise those that confuse the analyst.

The most important aim of the subject would be to ensure that students have had exposure to a range of both good and bad interfaces and are able to determine for themselves how intuitive, and to a lesser extent performant, an interface will be. Designing for the system is important, but designing for humans is the aim of the game.

January 16, 2010 • Tags: , , , • Posted in: Uncategorized • Comments Off