Secure Connector SW7 3.8 RDS

24
Parsing type-metadata about Generics on .NET c Miguel Garcia, LAMP, ´ Ecole Polytechnique F´ ed´ erale de Lausanne (EPFL) http://lamp.epfl.ch/ ~ magarcia April 12 th , 2010 Contents 1 Recap 2 1.1 Setting up the IDE ............................. 2 1.2 The debug configuration .......................... 3 2 Making type-parsing generics-aware 3 2.1 Handling instantiated types ........................ 3 2.2 GUI to navigate PE files .......................... 5 2.3 Advantages of CCI ............................. 6 3 Handling TypeSpec signatures 7 4 Handling generic parameters 9 5 Handling mscorlib.dll v4.0 10 6 GenericParamConstraint metadata table 12 6.1 Meaning of the bytes to parse ....................... 12 6.2 Code to parse the bytes .......................... 14 6.3 Data structures to hold the parsed bytes ................. 15 7 Related Work: Parsing type metadata as Query answering 16 8 Future Work: New features in CLR v4 for compiler writers 17 9 IDE tips and tricks 18 Abstract These notes chart the component in the Scala.Net compiler in charge of parsing type metadata, i.e., metadata from assemblies referenced by the program under compilation. This component was written before Generics became mainstream. We focus on updating it to handle metadata involv- ing type params and arguments. A caveat: propagating that additional type info through the compilation pipeline is left for another write-up :-) 1

Transcript of Secure Connector SW7 3.8 RDS

Parsing type-metadata about Generics on .NET

c© Miguel Garcia, LAMP,

Ecole Polytechnique Federale de Lausanne (EPFL)http://lamp.epfl.ch/~magarcia

April 12th, 2010

Contents

1 Recap 21.1 Setting up the IDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 The debug configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Making type-parsing generics-aware 32.1 Handling instantiated types . . . . . . . . . . . . . . . . . . . . . . . . 32.2 GUI to navigate PE files . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Advantages of CCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Handling TypeSpec signatures 7

4 Handling generic parameters 9

5 Handling mscorlib.dll v4.0 10

6 GenericParamConstraint metadata table 126.1 Meaning of the bytes to parse . . . . . . . . . . . . . . . . . . . . . . . 126.2 Code to parse the bytes . . . . . . . . . . . . . . . . . . . . . . . . . . 146.3 Data structures to hold the parsed bytes . . . . . . . . . . . . . . . . . 15

7 Related Work: Parsing type metadata as Query answering 16

8 Future Work: New features in CLR v4 for compiler writers 17

9 IDE tips and tricks 18

Abstract

These notes chart the component in the Scala.Net compiler in chargeof parsing type metadata, i.e., metadata from assemblies referenced by theprogram under compilation. This component was written before Genericsbecame mainstream. We focus on updating it to handle metadata involv-ing type params and arguments. A caveat: propagating that additionaltype info through the compilation pipeline is left for another write-up :-)

1

1 Recap

1.1 Setting up the IDE

The instructions to debug the Scala.Net cross-compiler were previously scatteredover several write-ups.

1. IntelliJ IDEA Community Edition EAPhttp://confluence.jetbrains.net/display/IDEADEV/Maia+EAP

2. Scala Plugin Nightly Builds http://confluence.jetbrains.net/display/SCA/Scala+Plugin+Nightly+Builds

3. Extract the Scala folder from the .zip file above into the plugins folderof the IDEA installation. Restart IDEA.

4. Obtain the sources for .NET of the Scala distro (compiler, library, etc.)svn co http://lampsvn.epfl.ch/svn-repos/scala/scala-msil/trunk

scalamsil

5. Back in IDEA, choose File | Open Project and pick the .ipr file in thesrc\intellij folder (which in turn is contained where the svn commandabove downloaded stuff).

6. The Project Structure can be shown, as well as the compilation unit for atype (such as GenMSIL, after typing CTRL + N). In both cases, squiggliesare shown below identifiers referring to the Scala standard library until . . .

7. In the Project Structure view (Alt + 1), right-click on the compiler

module and choose Module Settings. In the Dependencies tab, add aModule Dependency to library (the module in the current IDEA projectthat contains the Scala library sources).

8. Trying to launch the compiler in a debug session would greet us withException in thread "main"java.lang.NoClassDefFoundError: ch/epfl/lamp/compiler/msil/Type

To avoid that, File | New Module, choose Create from scratch, andpick the src\msil folder as “content root”. Java files are detected, so thatthe only thing left to do is clicking OK.

9. Add another Module Dependency in the compiler module, this time tothe just created msil module. In the Dependencies tab, now msil appearsat the bottom of the list. Move it up so that it appears above the entryScala Project SDK. Otherwise, lookups will go first to msil.jar ratherthan to the sources.

10. The msil module needs a Scala facet, because its package ch.epfl.lamp.compiler.msil.emitis written in Scala (the rest in Java). Finally, the msil module also de-pends on the library module.

11. To get rid of the few remaining squigglies (broken use-defs) see Sec. 9

12. Restart IDEA.

2

1.2 The debug configuration

1. Main class: scala.tools.nsc.Main

2. VM Parameters: add -Xbootclasspath/a: with the following jars:

• Z:\scalaproj\nscnet\lib\scala-compiler.jar;

• Z:\scalaproj\nscnet\lib\scala-library.jar;

• Z:\scalaproj\nscnet\lib\fjbg.jar;

• Z:\scalaproj\nscnet\lib\msil.jar

3. Program parameters:-target:msil -Xassem-extdirs Z:\scalaproj\mscor HelloWorld.scala

4. Working directory: Z:\scalaproj\mscor

which contains mscorlib.dll (copied from the distro’s very own lib

folder), predef.dll and scalaruntime.dll (ditto). Well, for any butthe simplest programs throw in System.dll too.

• After applying the patches in this report, the dlls above may beversion 2.0 or higher (i.e., containing generics)

5. At first, we’re not going to modify the sources of the compiler and thusit’s advisable to uncheck Before launch Make to save startup time.

2 Making type-parsing generics-aware

2.1 Handling instantiated types

The current type parser in the Scala.Net compiler was written before Genericshit the scene. Therefore, declarations involving generics represent unexpectedinput, that make the parser get out of synch and miss subsequent declarations.

For example, the field signature for PeReader.InternedIdToModuleMap (whosetype is Hashtable<Module> as shown in Figure 1) cannot be parsed correctly. Let’ssee (a) what the byte chunk for that signature looks like, and (b) what code hasto be added for the compiler to parse it.

This particular library, CCI, could be used from Scala.Net without making thecompiler generics-aware, by creating (non-generic) wrappers for the methodsof interest. But by following that path we wouldn’t learn nearly as much aboutcompiler internals, right?

A useful starting point in the debug session is TypeParser.parseClass(typ: MSILType).Shortly after parsing System.Object, scala.ScalaObject, and others, we get toMicrosoft.Cci.PeReader. The method goes on to obtain the parsed type’s at-tributes, interfaces, supertype, nested types, and then fields. That’s where theproblem manifests, upon invocation of getFields() in ch.epfl.lamp.compiler.msil.Type

(Listing 1).Actually, as part of the initFields() invocation the line highlighted in Fig-

ure 2 obtains a byte chunk (representing a field signature) that later decodeFieldType()

cannot parse (because of the type argument).The following resources provide more details about the byte layout to parse.

A visual depiction of that byte layout is the topic of Sec. 2.2.

3

Figure 1: InternedIdToMap

Listing 1: getFields

/**

* Return only the fields declared in this type.

*/

public FieldInfo[] getFields() {

initFields();

FieldInfo[] fields = new FieldInfo[this.fields.length];

System.arraycopy(this.fields, 0, fields, 0, fields.length);

return fields;

}

• Signatures under the hood

– http://www.codeproject.com/KB/dotnet/dotNetSignatures1.aspx

– http://www.codeproject.com/KB/dotnet/dotNetSignatures2.aspx

• More general article about the whole PE file format,http://www.codeproject.com/KB/dotnet/dotnetformat.aspx

The field signature that couldn’t be parsed can be inspected with ildasm

(Listing 2) and with the CFFExplorer (Sec. 2.2). That signature is to be parsedas per the following grammar production (reproduced from §23.2.12 (“Type”)in Partition II of the CLR standard).

ELEMENT_TYPE_GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type*

Supporting that grammar production requires adding the constant ELEMENT_TYPE_GENERICINST

in Signature.java as well as the associated case handler in PEFile.Sig.decodeType()

(shown in Listing 3 for illustration, which by itself does not yet work becauseType has not been extended to consider type args).

In order to avoid side-effects from an IDE, PEFile.Sig.toString() should repo-sition the buffer position as shown in Listing 4.

4

Figure 2: loadFields

2.2 GUI to navigate PE files

Unlike other GUI tools that present the results of parsing assembly metadata,CFFExplorer is the only tool that exposes with a minimal structure the underlyingrepresentation (Figure 19) .

To be more clear: in order to extend ch.epfl.lamp.compiler.msil as reported inthese notes, none of Visual Studio ObjectBrowser, ILDAsm, or .NET Reflectorare as useful as CFFExplorer.

In general, CFFExplorer simplifies debugging type decoding, whether donewith TypeParser or with the Common Compiler Infrastructure, supporting bothPE32 and PE64. Additionally, support is provided for: navigation mechanisms,in-place editing, scripting, disassembling, and dependency walking, among otherfeatures.

Coming back to our first example. We find the signature of the field ofinterest as follows:

• Right-clicking on the Field table, choosing Find with the start of the fieldname (InternedIdToModuleMapInILAsm, Figure 3)

5

Listing 2: InternedIdToModuleMap in ildasm

Field #3 (0400018f)

-------------------------------------------------------

Field Name: InternedIdToModuleMap (0400018F)

Flags : [Private] [InitOnly] (00000021)

CallCnvntn: [FIELD]

Field type: GenericInst Class Microsoft.Cci.UtilityDataStructures.Hashtable‘1< Class Microsoft.Cci.MetadataReader.ObjectModelImplementation.Module>

Signature : 06 15 12 81 f5 01 12 81 90

Listing 3: Case handler for ELEMENT TYPE GENERICINST

// a grammar production from 23.2.12 Type

// GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type*

case ELEMENT_TYPE_GENERICINST: // i.e. 0x15

int b = readByte(); // (ELEMENT_TYPE_CLASS | ELEMENT_TYPE_VALUETYPE) i.e. (0x12 | 0x11)

/*- TODO don’t ignore b as done above */

Type instantiatedType =

pemodule.getTypeDefOrRef(decodeInt()); // TypeDefOrRefEncoded e.g. 0x81 0xf5

int numberOfTypeArgs = decodeInt(); // GenArgCount e.g. 0x01

Type[] typeArgs = new Type[numberOfTypeArgs];

for (int iarg = 0; iarg < numberOfTypeArgs; iarg++){

typeArgs[iarg] = decodeType(); // Type* e.g. 0x12 0x81 0x90

}

type = instantiatedType; /*- <--- TODO add the type arguments */

break;

• the table entry thus found indicates that the field signature is located atposition 126B in the Blob stream.

• After opening that stream (located in MetaData Streams in .NET Directory),click on the arrow (“Go to Offset”) as shown in Figure 4.

• The signature has to be parsed in order to know where it ends . . .

2.3 Advantages of CCI

CCI is activaly maintained by a team of Microsoft developers. Users includesSpec#, VCC, Sandcastle, FxCop, and Code Contracts, among others. Detailsappear in § 6.1 of the write-up Decoding external types on JVM and CLR1.

It implements concepts like “as-seen-from” that we won’t necessarily use (theScala counterpart surpasses that) but anyway it’s reassuring to know the libraryis that complete. Same goes for the 64bit format PE64. As another example,when getting fields CCI takes into account whether UseFieldPtrTable. Similarly,a CCI-wide lock is kept on an assembly while loading its symbols (could berecompiled in between otherwise). Quality details that are not documented butcan be discovered from the source code (as I have done).

Pending more detailed examination :-) looks like an entry point to use CCIinstead of the type-metadata parsing component of compiler.msil is collectTypes()

(Listing 6).

1http://www.sts.tu-harburg.de/people/mi.garcia/ScalaCompilerCorner/

TypeDecoding.pdf

6

Listing 4: PEFile.Sig.toString()

public String toString() {

StringBuffer b = new StringBuffer("(");

int savedPos = buf.position();

reset();

for (int i = 0; i < length; i++) {

b.append(byte2hex(readByte()));

if (i < length - 1)

b.append(" ");

}

buf.position(savedPos);

return b.append(")").toString();

}

Listing 5: Handler for TypeSpec

object MsilClassPath {

def collectTypes(assemFile: AbstractFile) = {

var res: Array[MSILType] = MSILType.EmptyTypes

val assem = Assembly.LoadFrom(assemFile.path)

if (assem != null) {

// DeclaringType == null: true for non-inner classes

res = assem.GetTypes() filter (_.DeclaringType == null)

Sorting.stableSort(res, (t1: MSILType, t2: MSILType) => (t1.FullName compareTo t2.FullName) < 0)

}

res

}

. . .

3 Handling TypeSpec signatures

From the CCI sources we know that Microsoft.Cci.MutableCodeModel.Assembly is apublic sealed class that extends Module (also in that namespace) and implementsinterfaces IAssembly and ICopyFrom<IAssembly> (I guess this type causes the parserexception), where IAssembly belongs to the Microsoft.Cci namespace and ICopyFrom

to the Microsoft.Cci.MutableCodeModel namespace. In terms of physical layout,Assembly (as above) is defined in row 46 (see Figure 6) of the TypeDef table ofthe DLL Microsoft.Cci.MutableMetadata.dll.

In order to chase down where parsing gets out of synch we have to start thishigh in the containment hierarchy. Here we go. The layout of TypeDef can befound in §22.37 of Partition II of the CLR standard. In order to refresh ourminds:

For any given type, there are two separate and distinct chains ofpointers to other types (the pointers are actually implemented asindexes into metadata tables). The two chains are:

• Extension chain – defined via the Extends column of the Type-Def table. Typically, a derived Class extends a base Class (al-ways one, and only one, base Class)

• Interface chains – defined via the InterfaceImpl table. Typically,a Class implements zero, one or more Interfaces

7

Figure 3: Step 1 of navigating to InternedIdToModuleMap

Also relevant for our purposes (although not quite yet):

If a type is generic, its parameters are defined in the GenericParamtable (§22.20). Entries in the GenericParam table reference entriesin the TypeDef table; there is no reference from the TypeDef table tothe GenericParam table.

The InterfaceImpl table contains in rows 235 till 246 (both inclusive) indexesinto the TypeRef table. For comparison, the supertypes (both superclasses and in-terfaces) of Assembly are shown in Figure 20. Most entries refer to the TypeRef ta-ble (with indexes 5, 6, 7, 15, 55, 56, 57, 58, 59, 60, 61) while the lonely index48 points to the TypeSpec table. Precisely this pointer causes parsing to derail,which can be overcome by handling case Table.TypeSpec.ID in getTypeDefOrRef()

as shown in Listing 6.BTW, the order in which the indexes above appear matches that in which

ildasm shows the supertypes of Assembly (Figure 9). Because of this, the lastindex (48) denotes ICopyFrom<IAssembly>.

Relevant fragments of the spec

• What on earth is TypeSpec for? You’re not alone . . . (Figure 5).

• The Blob byte sequence pointed from the TypeSpec entry above (as shownin Figure 7) starts with 07 15 12 08 01 12 80 F1. §23.1.14 is reproduced inFigure 8. One more blob of the spec is needed to decipher it (Figure 10).

8

Figure 4: Step 2 of navigating to InternedIdToModuleMap

Figure 5: Text from the spec

4 Handling generic parameters

There’s two more ELEMENT_TYPE. . . constants to add to Signature.java. It’s de-scribed in §23.1.16 as follows:

ELEMENT TYPE VAR 0x13 Generic parameter in a generictype definitionrepresented as number(compressed unsigned integer)

ELEMENT TYPE MVAR 0x1e Generic parameter in a genericmethod definitionrepresented as number(compressed unsigned integer)

Again in PEFile.decodeype0(), two additional case handlers are needed: (a) List-ing 7, to unparse Figure 11; and (b) Listing 8. The exception occurred origi-nally while loading the methods of type ICopyFrom<ImmutableObject> (reproducedin Listing 9).

9

Figure 6: See Sec. 3

Figure 7: A TypeSpec entry

5 Handling mscorlib.dll v4.0

Example 1

The metadata for the field shown in Figure 12 was not being parsed. The typeof that field is given by the entry in the TypeRef table shown in Figure 13.

Oh well. The problem is due to System.Collections.Generics not being presentin the mscorlib.dll delivered with Scala.Net distro (that dll predates generics)

A detour from Example 1

Once mscorlib.dll v4.0 is added in -Xassem-extdirs, we’re in for another surprise,as one of the Join methods in System.String (at this point during debugging notclear which) causes an exception (when trying to access the param types pastthe first param). Probably the info in ParamList is not being parsed correctly,which is referred from Figure 14.

After retrieving the two entries in the ParamList table, nothing wrong there.Actually the exception (ArrayIndexOutOfBounds) occurrs while reading the type ofthe second param (the array in question, parsed from ParamTypes table, has asingle entry). The statement int paramCount = sig.decodeInt(); where paramCount

works OK. The signature of the method (signature obtained from the entry inMethodDef) reads (10 01 02 0e 0e 15 12 81 20 01 1e 00).

That also looks allright. But the statements commented-out in Listing 10read one entry too many in the Params table, because of the way paramListEnd iscomputed. Why not use paramCount instead, which is correct anyway? Listing 10.

Back to Example 1

This time the entry for the System assembly in the AssemblyRef table is not beinghandled properly (Figure 15) in that such assembly is not found on disk (it’snot there). The System assembly is decoded as part of finding a Type instance for

10

Figure 8: BlobSignatureIndexedFromTypeSpec

Figure 9: Assembly supertypes as shown by ILDASM

System.Collections.Generic.Stack‘1, which for some reason is not in mscorlib.dll (Ichecked with ILDAsm). OH WELL. That Stack class lives not in mscorlib.dll butin System.dll, which I did not copy to -Xassem-extdirs. But the errors reportedby the compiler keep shrinking: no more type-metadata parsing errors!

HelloILScala.scala:55: error: Microsoft.Cci.ILGeneratorMethodBody does not have a constructor

var body = new ILGeneratorMethodBody(ilGenerator, true, 1);

^

HelloILScala.scala:60: error: overloaded method value GetMethod with alternatives

(x$1: System.Collections.Generic.IEnumerable‘1, x$2: Microsoft.Cci.IName,

x$3: Array[Microsoft.Cci.ITypeReference])Microsoft.Cci.IMethodDefinition

<and>

(x$1: Microsoft.Cci.ITypeDefinition, x$2: Microsoft.Cci.IName,

x$3: Array[Microsoft.Cci.ITypeReference])Microsoft.Cci.IMethodDefinition

cannot be applied to

(Microsoft.Cci.INamedTypeDefinition,

Microsoft.Cci.IName,

Microsoft.Cci.INamespaceTypeReference)

var writeLine = TypeHelper.GetMethod(systemConsole, nameTable.GetNameFor("WriteLine"),

host.PlatformType.SystemString);

^

HelloILScala.scala:66: error: value peStream is not a member of object System.IO.Stream

Stream peStream = File.Create("HelloILScala.exe");

^

HelloILScala.scala:67: error: not found: value PeWriter

PeWriter.WritePeToStream(assembly, host, peStream);

^

four errors found

11

Figure 10: TypeDefOrRefEncoded text from the spec

6 GenericParamConstraint metadata table

6.1 Meaning of the bytes to parse

Quoting from [1, Ch. 11]:

The GenericParamConstraint metadata table contains inheritance andimplementation constraints imposed on the generic parameters. Aninheritance constraint imposed on a generic parameter means thatthe type substituting for the parameter in a generic instantiationmust be derived from the specified type. An implementation con-straint means that the type substituting for this parameter must im-plement the specified interface. Each record in this table has twoentries:

• Owner (RID [row id] in the GenericParam table). The index of theGenericParam record describing the generic parameter to whichthis constraint is attributed.

• Constraint (coded token of type TypeDefOrRef). A token of theconstraining type, which can reside in the TypeDef, TypeRef, orTypeSpec table. The nature of the constraint (inheritance or im-plementation) is defined by the constraining type: if it is aninterface, then it’s an implementation constraint; otherwise it’san inheritance constraint. . . .

In the optimized metadata model, the GenericParamConstraint recordsmust be sorted by their Owner field.

All of Ch. 11 in Lidin’s book is a must read to understand how unparsing ofgenerics should work. A few of the most relevant points to notice follow:

12

Listing 6: Handler for TypeSpec

case Table.TypeSpec.ID:

Table.TypeSpec ts = pefile.TypeSpec;

ts.readRow(row);

int posInBlobStream = ts.Signature;

byte[] blobArrWithLengthStripped = pefile.Blob.getBlob(posInBlobStream);

byte[] compressedUInt = compressUInt(blobArrWithLengthStripped.length);

byte[] byteArr = new byte[blobArrWithLengthStripped.length + compressedUInt.length];

System.arraycopy(compressedUInt, 0, byteArr, 0, compressedUInt.length);

System.arraycopy(blobArrWithLengthStripped, 0, byteArr, compressedUInt.length, blobArrWithLengthStripped.length);

ByteBuffer buf = ByteBuffer.wrap(byteArr);

Sig sig = pefile.new Sig(buf);

int desc = sig.readByte();

switch (desc) {

// GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type*

case Signature.ELEMENT_TYPE_GENERICINST: // i.e. 0x15

int b = sig.readByte(); // (ELEMENT_TYPE_CLASS | ELEMENT_TYPE_VALUETYPE) i.e. (0x12 | 0x11)

/*- TODO don’t ignore b as done above */

Type instantiatedType = getTypeDefOrRef(sig.decodeInt());

// TypeDefOrRefEncoded

int numberOfTypeArgs = sig.decodeInt(); // GenArgCount

Type[] typeArgs = new Type[numberOfTypeArgs];

for (int iarg = 0; iarg < numberOfTypeArgs; iarg++){

typeArgs[iarg] = sig.decodeType(); // Type*

}

type = instantiatedType; // TODO add the type arguments

break;

default:

// TODO handle remaining grammar productions in 23.2.14

throw new RuntimeException("PEModule.getTypeDefOrRef(): TypeSpec");

}

break;

The high-level languages bypass this limitation and allow you to de-fine types G and G<T> (and G<T,U>, and so on) in the same module bymangling the names of generic types, usually adding the generic ar-ity (the number of type parameters) to the type name. For example,VB and C# emit type G as G, type G<T> as G‘1, type G<T,U> as G‘2, andso on (now you probably have guessed why the backtick symbol wasadded as a legal identifier symbol in ILAsm 2.0).

. . .

The IL assembler does not do the type name mangling automatically,leaving it to the programmer or to the tool (for example, a compiler)generating ILAsm code.

Nested classes deserve their own section of comments:

I must warn you about one helpful feature of the C# compiler. Whenyou declare a class nested in a generic class, the compiler presumesthat the nested class needs access to the type parameters of the en-closer and makes the nested type generic. . . . Note that the C# com-piler mangles the nested type’s name according to its own declaredgeneric arity, not according to the summary encloser’s and nested

13

Listing 7: Handler for TypeVar

// another grammar production from 23.2.12 Type

// ELEMENT_TYPE_VAR number The number non-terminal following MVAR

// or VAR is an unsigned integer value (compressed).

case ELEMENT_TYPE_VAR:

int typeArgAsZeroBased = decodeInt();

// TODO pending adding another CST node kind

type = Type.GetType("System.Object"); // just to keep going

break;

Listing 8: Handler for MVar

// another grammar production from 23.2.12 Type

// ELEMENT_TYPE_MVAR number The number non-terminal following MVAR

// or VAR is an unsigned integer value (compressed).

case ELEMENT_TYPE_MVAR:

typeArgAsZeroBased = decodeInt();

// TODO pending adding another CST node kind

type = Type.GetType("System.Object"); // just to keep going

break;

type’s arity . . . ILAsm can reference the type parameters by ordinalas well as by name, so duplicate names of type parameters don’tprevent these parameters from being addressed.

The flags for type variables are depicted graphically in Figure 17.

6.2 Code to parse the bytes

The code snippets shown thus far (to parse specific fragments of type-metadata)run after loadTypes() (Figure 11) has its activation record in the call-stack. AftergetTypeDef(row) has run for all rows in the TypeDef table, parsing of the constraintson type variables (if any) can begin (the bounds they refer to will already haveCST nodes built form them). A module that declares no generic types norgeneric methods won’t have a GenericParam table.

A type variable read from GenericParam never belongs to a type declarationor method declaration located in another module, only intra-module referencesoriginate from this table [1, Ch. 11]: “TypeRefs and MemberRefs, even those of

Listing 9: An interface with a method with a param with a generic type arg

public interface ICopyFrom<ImmutableObject> {

/// <summary>

/// Makes this mutable object a copy of the given immutable object.

/// </summary>

/// <param name="objectToCopy">An immutable object that implements the same object model interface as this mutable object.</param>

/// <param name="internFactory">The intern factory to use for computing the interned identity (if applicable) of this mutable object.</param>

void Copy(ImmutableObject objectToCopy, IInternFactory internFactory);

}

14

Figure 11: What the spec says for GenericParam

Figure 12: ILGenerator problem field

generic types and methods, don’t have their generic parameters represented inthe GenericParam table; the generic parameters and their constraints are alwaysdefined together with their owners, in the metadata of the same module.” Thebounds in the GenericParamConstraint table, in contrast, may be represented notonly in the TypeDef but also the TypeRef and TypeSpec tables, i.e. can be inter-module references.

The byte layout of the GenericParam table is defined in § 22.20 of Parti-tion II, that for GenericParamConstraint in § 22.21. The parsing implementationin CCI is shown in Listing 12 (for GenericParam rows) and in Figure 13 (forGenericParamConstraint rows).

TODO parse the GenericParam table

TODO parse the GenericParamConstraint table

6.3 Data structures to hold the parsed bytes

Sorry about so much level of detail but I want to avoid re-discovery time later.In file Table.java, Listing 14 was added.

15

Figure 13: More on the ILGenerator problem field

Figure 14: JoinSigA

7 Related Work: Parsing type metadata as Queryanswering

These notes considered just two ways (two APIs, compiler.msil and CCI) toperform type discovery on .NET assemblies, among the many alternatives outthere. Wouldn’t it be more productive to have a conceptual model of assemblymetadata, thus enabling declarative rather than programmatic queries?

That idea is explored in the blog series “Exploring relational schema for.NET Assemblies in SQL Server Modeling Services” at http://blogs.msdn.

com/sonuarora/archive/2010/03/01/exploring-relational-data-schema-for-net-assemblies.

aspx, thus enabling use cases like those shown in Figure 18.Coming back to Scala, the only way I know to explore ASTs is from inside a

compiler plugin (there’s no external representation for ASTs, “external” as in aqueryable database, as e.g. this one for Java http://semmle.com/semmlecode/

documentation/ql/). These endeavors fall under the umbrella of code reposito-ries, where “code” is construed to be either “source” or “ASTs”. Some resources

Listing 10: A snippet from loadMethods() in PEType

ParameterInfo[] params = new ParameterInfo[paramCount];

int paramListBeg = file.MethodDef.ParamList;

int paramListEnd = paramListBeg + paramCount;

// if not the last method

// Miguel says: paramCount is correct, and the following leads to

// ArrayIndexOutOfBounds in mscorlib.dll v4

// if (file.MethodDef.currentRow() < file.MethodDef.rows) {

// paramListEnd = file.MethodDef(mrow + 1).ParamList;

// }

for (int i = paramListBeg; i < paramListEnd; i++) {

16

Figure 15: System in AssemblyRef

Figure 16: These are Types, but CST-level ones!

follow:

• Workshop Query Technologies and Applications for Program Comprehen-sion as part of the International Conference on Program Comprehensionhttp://www.program-comprehension.org/

• Tiago L. Alves, Peter Rademaker, and Jurriaan Hage, Comparative Studyof Code Query Technologies, Draft December 2009, http://wiki.di.

uminho.pt/twiki/pub/Personal/Tiago/Publications/Alves09b-draft.

pdf

8 Future Work: New features in CLR v4 forcompiler writers

CLR homepage http://msdn.microsoft.com/en-us/netframework/aa663296.aspx

Joshua Goodman: What’s new in CLR 4 for languages2.

2http://www.langnetsymposium.com/2009/talks/20-JoshuaGoodman-CLR.html

17

Figure 17: GenericParamFlags

There are too many new features to cover in a short talk, so I’ll focuson those that will be most interesting to language designers. First,we added several new types, including Tuple, BigInteger, and Com-plex. Tuple is a good example of a feature that seems very simple,but becomes interesting when you have to design it to work acrossdifferent languages that may or may not natively support the type.Fortunately, those few types and a few minor improvements to theruntime were all that we needed to do to support functional lan-guages like F# . . . There is a new exception type, Corrupted StateExceptions, designed to prevent a common developer mistake. Andwe’ve added support for type equivalence across libraries; in com-bination with related work, this makes the experience of deployingCOM addins much lower cost.

Ian Carmichael: The History and Future of the CLR, http://channel9.msdn.com/posts/Charles/Ian-Carmichael-The-History-and-Future-of-CLR/

9 IDE tips and tricks

I’m writing down these cookbook recipes because I keep forgetting them, soplease feel free to skip this section if you don’t use any IDE.

Visual Studio

I was trying to step into a method but instead of doing that the output win-dow showed: “Stepping over method without symbols”, although the Modules

18

Listing 11: loadTypes

/** Load information about the types defined in this module,

* from the TypeDef table.

*/

protected void loadTypes() {

typeRefs = new Type[pefile.TypeRef.rows];

final int nbTypes = pefile.TypeDef.rows;

for (int row = 2; row <= nbTypes; row++) {

String name = pefile.TypeDef(row).getFullName();

typesMap.put(name, new Integer(row));

}

this.types = new Type[nbTypes - 1];

for (int row = 2; row <= nbTypes; row++) {

getTypeDef(row);

}

}

window (visible only in Break mode) showed the PDB had been loaded for theassembly containing the method. After some trial and error it turned out thatthe method returned an IEnumerable, and the method’s yield return meant thatwithout forcing it (say, with toList) the execution never got into the method.The debugger was right after all.

Another debugger-related trap is System.Diagnostics.DebuggerStepThrough3 be-cause “The CLR attaches no semantics to this attribute. It is provided for useby source code debuggers. For example, the Visual Studio debugger does notstop in a method marked with this attribute but does allow a breakpoint to beset in the method.” Not to be found in CCI, fortunately.

For some reason the editor shortcut (hotkey in VS-speak) to navigate back-wards gets lost, and has to be reinstantiated by:

1. Tools, Customize, Commands

2. In Menu bar pick ReSharper | Navigate

3. From the list, notice ReSharper_NavigateBackward. This was just to learn thecommand’s internal name, afterwards can be reached more directly withTools, Options, Environment, Keyboard. But for now just press Keyboard

4. Pick the (Default) scheme, pick the ReSharper_NavigateBackward command,and press the chosen shortcut keys. Don’t forget to click Assign.

5. Ditto for ReSharper_NavigateForward

IDEA

In file Trees.scala (in the folder for scala.tools.nsc.ast), IDEA can’t get use-defsright unless trait Trees extends reflect.generic.Trees is rewritten to the more ex-plicit trait Trees extends scala.reflect.generic.Trees. Same goes for other cases(UnPickler extends scala.reflect.generic.UnPickler, etc.)

3http://msdn.microsoft.com/en-us/library/system.diagnostics.

debuggerstepthroughattribute.aspx

19

Listing 12: How CCI reads a GenericParam row

internal GenericParamRow this[uint rowId] // This is 1 based...

{

get {

int rowOffset = (int)(rowId - 1) * this.RowSize;

ushort number =

this.GenericParamTableMemoryReader.PeekUInt16(rowOffset + this.NumberOffset);

GenericParamFlags flags =

(GenericParamFlags)this.GenericParamTableMemoryReader.PeekUInt16(

rowOffset + this.FlagsOffset);

uint owner =

this.GenericParamTableMemoryReader.PeekReference(

rowOffset + this.OwnerOffset,

this.IsTypeOrMethodDefRefSizeSmall);

owner = TypeOrMethodDefTag.ConvertToToken(owner);

uint name =

this.GenericParamTableMemoryReader.PeekReference(

rowOffset + this.NameOffset,

this.IsStringHeapRefSizeSmall);

GenericParamRow genericParamRow = new GenericParamRow(number, flags, owner, name);

return genericParamRow;

}

}

Listing 13: How CCI reads a GenericParamConstraint row

internal uint GetConstraint(

uint rowId

) {

int rowOffset = (int)(rowId - 1) * this.RowSize;

uint constraint =

this.GenericParamConstraintTableMemoryReader.PeekReference(

rowOffset + this.ConstraintOffset,

this.IsTypeDefOrRefRefSizeSmall);

constraint = TypeDefOrRefTag.ConvertToToken(constraint);

return constraint;

}

Quoting from scala-internals: It’s possible to use fsc for compiling the scalacompiler inside IDEA. one just has to start the CompileServer on the commandline using the command below, then tick “Use fsc” in the project settings. lukas

/home/rytz/Applications/java-1.6/bin/java

-Denv.classpath="%CLASSPATH%" -Denv.emacs="%EMACS%"

-Didea.launcher.port=7534 -Didea.launcher.bin.path=/home/rytz/Applications/idea-IC-95.24/bin

-Dfile.encoding=UTF-8

-classpath /home/rytz/scala/trunk/build/locker/classes/library

:/home/rytz/scala/trunk/build/locker/classes/compiler

:/home/rytz/scala/trunk/lib/fjbg.jar

:/home/rytz/Applications/idea-IC-95.24/lib/idea_rt.jar

com.intellij.rt.execution.application.AppMain scala.tools.nsc.CompileServer

20

Listing 14: GenericParamConstraint extends Table

//##########################################################################

// table GenericParamConstraint; ID=0x2c; p139, 22.20

public static final class GenericParamConstraint extends Table {

public static final int ID = 0x2c;

public GenericParamConstraint(PEFile file, int rows) { super(file, ID, rows); }

protected void populateFields() {

Number = readInt();

Flag = readInt();

Owner = readTypeOrMethodDefCodedIndex(); // a TypeOrMethodDef (24.2.6) coded index

Name = readStringIndex(); // (a non-null index into the String heap)

}

protected int getRowSize() {

return 4; // constant size

// + file.getStringIndexSize() + file.getTableSetIndexSize(_Implementation)

}

} // class GenericParamConstraint

References

[1] Serge Lidin. Expert .NET 2.0 IL Assembler. Apress, Berkely, CA, USA,2006.

21

Figure 18: MetadataQuery

22

Fig

ure

19:

CF

FE

xp

lore

r

23

Figure 20: Super types of Assembly

24