Dcspm develop and Compile Subset of pascal language to msil


Conditional Branching Instructions



Yüklə 1,28 Mb.
səhifə3/4
tarix07.11.2018
ölçüsü1,28 Mb.
#78911
1   2   3   4

Conditional Branching Instructions:


  • brfalse (brnull, brzero)  (0x39). Branch if  is 0. *

  • brfalse.s (brnull.s, brzero.s)  (0x2C). The short-parameter form of brfalse. I used brfalse.s in my project is an improvement in the If /Else statement MSIL, I will talk about it later.

  • brtrue (brinst)  (0x3A). Branch if  is nonzero.


* is obtained from top value or the stack
brtrue.s (brinst.s)  (0x2D). The short-parameter form of brtrue.
        1. Comparative Branching Instructions


Comparative branching instructions take two values (, ) from the evaluation stack and compare them according to the  specified by the opcode. Not all combinations of types of  and  are valid. These are the ones I used in my project:

  • bgt.s  (0x30). The short-parameter form of bgt.

  • blt.s  (0x32). The short-parameter form of blt.

  • beq.s  (0x2E). The short-parameter form of beq.

  • bne.un.s  (0x33). The short-parameter form of bne.un.

  • ble.s  (0x31). The short-parameter form of ble.

  • bge.un.s  (0x34). The short-parameter form of bge.un.
        1. Constant Loading


Constant loading instructions take at most one parameter (the constant to load) and load it on the evaluation stack. The ILAsm syntax requires explicit specification of the constants (in other words, you cannot use a variable or argument name), in decimal or hexadecimal form:

Some instructions have no parameters because the value to be loaded is specified by the opcode itself.

Note that for integer and floating-point values, the slots of the evaluation stack are either 4- or 8-bytes wide, so the constants being loaded are converted to the suitable size.


  • ldc.i4  (0x20). Load  on the stack.

  • ldc.i4.s  (0x1F). Load  on the stack.

  • ldc.i4.m1 (ldc.i4.M1) (0x15). Load –1 on the stack.

  • ldc.i4.0 (0x16). Load 0.

  • ldc.i4.1 (0x17). Load 1.

  • ldc.i4.2 (0x18). Load 2.

  • ldc.i4.3 (0x19). Load 3.

  • ldc.i4.4 (0x1A). Load 4.

  • ldc.i4.5 (0x1B). Load 5.

  • ldc.i4.6 (0x1C). Load 6.

  • ldc.i4.7 (0x1D). Load 7.
        1. Logical Condition Check Instructions


Logical condition check operations are similar to comparative branching instructions except that they result not in branching but in putting the condition check result on the stack. The result type is int32, and its value is equal to 1 if the condition checks and 0 otherwise; in other words, logically the result is a Boolean value. The two operands being compared are taken from the stack, and since no branching is performed, the condition check instructions have no parameters.

The logical condition check instructions are useful when you want to store the result of the condition check for multiple use or for later use. If you need the condition check to decide only once and on the spot whether you need to branch, you would be better off using a comparative branching instruction.”[10]





  • ceq (0xFE 0x01). Check whether the two values on the stack are equal.

  • cgt (0xFE 0x02). Check whether the first value is greater than the second value. It’s the stack we are working with, so the “second” value is the one on the top of the stack.

  • clt (0xFE 0x04). Check whether the first value is less than the second value.
        1. Local Variable Loading


Local variable loading instructions are similar to argument loading instructions except that no “invisible” items appear among the local variables, so local variable number 0 is always the first one specified in the local variable signature.


  • ldloc  (0xFE 0x0C). Load the value of local variable number  on the stack. Like the argument numbers, local variable numbers can range from 0 to 65534 (0xFFFE). The value 65535, also admissible for unsigned 2-byte integers, is excluded because otherwise the counter of local variables would have to be 4 bytes wide. Limiting the number of the local variables, however standardized, seems arbitrary and implementation specific, because the number of local variables of a method is not stored in the metadata or in the method header, so this limitation comes purely from one particular implementation of the JIT compiler.

  • ldloc.s  (0x11). The short-parameter form of ldloc.

  • ldloc.0 (0x06). Load the value of local variable number 0 on the stack.

  • ldloc.1 (0x07). Load the value of local variable number 1 on the stack.

  • ldloc.2 (0x08). Load the value of local variable number 2 on the stack.

  • ldloc.3 (0x09). Load the value of local variable number 3 on the stack.”[10]



3.1.5: MSIL (Microsoft Intermediate Language) Design
After the source code has been tokenized, the parsing phase commences. At the end of this stage, if the source code is syntactically valid, the compiler has generated: (1) an abstract syntax tree (AST) of the source code and (2) Microsoft Intermediate Language (MSIL). The parser phase starts with the Program Function which the source code has to start with the “Program” keyword. When the parser is calling the nextToken() method to read the next string, MSIL is ready to call the newlabel method that set up a new label and then call the emit method to combine the new label with the opcode. When the Program function matches the “Program” keyword, it is going to call declaration function which matches “Var” keyword then calls “identifier list“ function which has a unique number is 256 and number has a unique number too is 257 after matching “;’ and “Type” function is to recognize the identifier type. After calling declaration function, “Program” function calls “Compound statements” function so, MSIL starts from here for being compiled. The Parser phase has some constructor and many methods.

3.1.5.1. Constructors:

Constructors are class methods that are executed when an object of a given type is created. Constructors have the same name as the class, and usually initialize the data members of the new object.

1- ldloc Table: it is used to save the variable with its load local location

2-Stloc Table: it is used for save the variable with its store local location




        1. Methods:




<program::=

Program <identifier; <block.







  1. Program method:





.



;

ID

Program




<program::=

Program <identifier; <block.



  1. Declaration method











:

Var



< declaration> ::=

var  :  
    



  1. Identifier List method







ID

Ldloc.Add( lookahead.attr,”ldloc.jj);

Stloc.Add( lookahead.attr,”stloc.jj); jj++;



| ,





<Identifier list::=

ID <Identifier list> | , <Identifier list



< type>
Type method










|

Integer | real |



;





:

ID




  1. Standard Type method







Integer| real :

Inqueue(“.locals init ( [1] int32, [2] int32,…)



  1. Compound Statements Method










emit(IL_####, " ", "ret", "", "\n")

End ;

Begin



Prgoram function after calling Compand statements function, Compound function matches “Begin” keyword and then calls Statements List function for the other statements such as Writeline statement, if statement, if/else statement,..etc. We will talk about MSIL statements in a bit. Next, matches End of Begin of our program and then it will do emit “IL_####:” and “ret” instruction which return from method, possibly with a value.





  1. Statements List Method









>



; Semicolon

After Compound Statements function called Statements list function, Statements list function is going to call Statement function which decides or parses and compiles a statement to MSIL. After that, matching the semicolon of statement and then calling Statements List again for another statement this function is a recursion function because calling itself.




  1. Statement Method









While

Writeline

|

Case

|

>

|

Begin

|

If

|

|

For

expression <simple expression>


<Newlabel>: “<” : emit(“IL_##”,”clt”)

“>” : emit(“IL_##”,”cgt”)

“<=” : emit(“IL_##”,”cgt”)

“>=” : emit(“IL_##”,”clt”)

“==” : emit(“IL_##”,”ceq”)

“<>” : emit(“IL_##”,”ceq”)










simple expression


<Newlabel> “+” : emit(“IL_##”,”add”)

“-” : emit(“IL_##”,”sub”)




<simple expression>



<factor>



<term>


<Newlabel> “*” : emit(“IL_##”,”mul”)

“/” : emit(“IL_##”,”div”)




<term>


<factor>

ID : “*” : emit(“IL_##”,”ldloc.##”)


<factor>



( <IFlogic> )






NUM : “*” : emit(“IL_##”,”ldc.i4.Num”)



IFlogic

<AndLogic>


OR : <NewLabel> if “bge.s”:emit(IL_##,”blt.s IL_##”)

else “ble.s”:emit(IL_##,”bgt.s IL_##”)

else “blt.s”:emit(IL_##,”bge.s IL_##”)

else “bgt.s”:emit(IL_##,”ble.s IL_##”)

else “bne.un.s”:emit(IL_##,”bne.un.s IL_##”)

else “beq.s”:emit(IL_##,”beq.s IL_##”) }

“brtrue.s”:emit(IL_##,”brtrue.s IL_##”)





<expression list>






<IFlogic>


<ANDlogic>




AND: <NewLabel> if “bge.s”:emit(IL_##,”bge.s IL_##”)

else “ble.s”:emit(IL_##,”ble.s IL_##”)

else “blt.s”:emit(IL_##,”blt.s IL_##”)

else “bgt.s”:emit(IL_##,”bgt.s IL_##”)

else “bne.un.s”:emit(IL_##,”beq.s IL_##”)

else “beq.s”:emit(IL_##,”bne.un.s IL_##”) }

“brtrue.s”:emit(IL_##,”brtrue.s IL_##”)




<ANDlogic>





ID | NUM: <expression>



<expression list>


,’: <expresson>











If stat.


If LOGIC==false: <NewLabel> emit(IL_##,”ldc.i4.0”);

emit(IL_##,”ceq”);

if LOGIC == true: <NewLabel> emit(IL_##,”br.s IL_”count+3”);

AND: <NewLabel> emit(IL_##,”ldc.i4.0”);

OR: <NewLabel> emit(IL_##,”ldc.i4.1”);



If IFLOGIC==true: <NewLabel> emit(IL_##,”br.s IL_count+3”);

<NewLabel> emit(IL_##,”ceq”);

<NewLabel> emit(IL_##,”ldc.i4.0”);


<NewLabel> emit(IL_##,”stloc.ii.ToString”);

<NewLabel> emit(IL_##,”ldloc.ii.Tostring”);

ii++;



<NewLabel> emit(IL_##,”brtrue.s IL_”BIF” ”);

Then ;



else : Brif =count; <NewLabel> emit(IL_##,”br.s IL_”BrIF” ”); BIF=count;









( emit(“IL_##”, “br.s IL_ForwordLabel”Brture=count;






While stat.


f= lookahead.code; d= lookahead.attr; )


Do Begin ForwordLabel = count;

emit(“IL_##,”ldloc.”+d.ToString())


If f==NUM; emit(“IL_##,”ldc.i4.”+f.ToString())

If f==ID; emit(“IL_##,”ldc.i4.”+f.ToString())


<Newlabel>: “<” : emit(“IL_##”,”clt”);

“>” : emit(“IL_##”,”cgt”);

“<=” : {emit(“IL_##”,”cgt”);

emit(“IL_##,”ldc.i4.0”);

emit(“IL_##,”ceq”);}

“>=” : {emit(“IL_##”,”clt”);

emit(“IL_##,”ldc.i4.0”);

emit(“IL_##,”ceq”);}

“==” : emit(“IL_##”,”ceq”);

“<>” : emit(“IL_##”,”ceq”);




<NewLabel> emit(IL_##,”stloc.ii.ToString”);

<NewLabel> emit(IL_##,”ldloc.ii.Tostring”);

ii++;




<NewLabel> emit(IL_##,”brtrue.s IL_”Brtrue” ”);

End ;



  1. Implementation

DCSPM is programmed in Microsoft visual C# Express 2010 that is contained in the MSDN Library, which you can install locally on your own computer or network, and which is also available on the internet at http://msdn.microsoft.com/library.

“C# (pronounced "C sharp") is a programming language that is designed for building a variety of applications that run on the .NET Framework. C# is simple, powerful, type-safe, and object-oriented. The many innovations in C# enable rapid application development while retaining the expressiveness and elegance of C-style languages.

Visual C# is an implementation of the C# language by Microsoft. Visual Studio supports Visual C# with a full-featured code editor, compiler, project templates, designers, code wizards, a powerful and easy-to-use debugger, and other tools. The .NET Framework class library provides access to many operating system services and other useful, well-designed classes that speed up the development cycle significantly” [18].

“This effectively reduces the refactoring capabilities of Visual C# Express to Renaming and Extracting Methods. Developers state the reason of this removal as "to simplify the C# Express user experience". However this created a controversy as some end users claim it is an important feature, and instead of simplifying it cripples the user experience.

The ability to attach the debugger to an already-running process has also been removed, hindering scenarios such as writing Windows services and re-attaching a debugger under ASP.NET when errors under the original debugging session cause breakpoints to be ignored.

Additionally it has been observed that the express version requires that the time between builds be greater than approximately 20 seconds. If a project is rapidly modified and rebuilt the target will not be updated even though the source has been modified and saved.”[19]

 

The steps required to create a .NET application :



  1. Application code is written using a .NET-compatible language such as C#.

  2. That code is compiled into CIL, which is stored in an assembly such as Figure 10.

referenced image



Figure 10: Application Code using .NET



  1. When this code is executed (either in its own right if it is an executable or when it is used from other code), it must first be compiled into native code using a JIT compiler such as Figure 11.


referenced image



Figure 11: JIT Compilation




  1. The native code is executed in the context of the managed CLR, along with any other running applications or processes, as shown in such as Figure 12.


referenced image


Figure 12: .NET CLR

Microsoft Visual C# is a programming environment used to create computer applications for the Microsoft Windows family of operating systems. It combines the C# language and the .NET Framework.

To test the MSIL, I general used the MSIL Disassembler (Ildasm.exe) tool that is included with the .NET Framework SDK. The Ildasm.exe parses any .NET Framework .exe or .dll assembly, and shows the information in human-readable format. Ildasm.exe shows more than just the Microsoft intermediate language (MSIL) code — it also displays namespaces and types, including their interfaces. You can use Ildasm.exe to examine native .NET Framework assemblies, such as Mscorlib.dll, as well as .NET Framework assemblies provided by others or created yourself. Most .NET Framework developers will find Ildasm.exe indispensable. You can find this tool FrameworkSDK\Bin\ildasm.exe in your computer as I explained that in Background section 2.
ILAsm has the same instruction set as the native assembly language. You can write code for ILAsm in any text editor like notepad and then can use the command line compiler (ILAsm.exe) provided by the .NET framework to compile that. ILAsm.exe is a command line tool shipped with the .NET Framework and can be located at \Microsoft.NET\Framework\ folder. You can include this path in your path environment variable. When you have finished compiling your .IL file, then it will output the exe with the same name as that of .IL file. You can specify the output file name using /OutPut= switch like ILAsm Test.il /output=MyFile.exe. To run the output exe file, just type the name of the exe and hit return. Output will be before you on the screen. [11]
When the .il file is compiled it needs the Fusion.dll file. “Fusion.dll is an assembly manager module used with the .net framework of Microsoft. The Common Language Runtime (CLR) contains a system component called the assembly manager that takes on the responsibilities of storing assembly files in the Global Assembly Cache (GAC) and loading them at run time when they are first used by an application. The Global Assembly Cache is the central repository for assemblies installed on a Windows machine. It provides a uniform, versioned and safe access of assemblies by their strong assembly name. The assembly manager is loaded from the system component fusion.dll.”[20]


  1. Improvements and Evaluations




    1. Improvements




    1. Lexical Analysis Improvement

In the lexical analysis, the symbol table used the Array list data structure which may not always offer the best performance for a given task. The symbol table is a data structure , where each keyword and identifier in a program's source code is associated with information relating to its declaration or appearance in the source, such as its type, scope level and sometimes its location.

public static ArrayList a = new ArrayList();

public static void insertKeyword()

{
a.Add(new SymbolTable("Begin", 300));

a.Add(new SymbolTable("And", 301));

a.Add(new SymbolTable("Case", 302));

a.Add(new SymbolTable("Const", 303));

a.Add(new SymbolTable("Div", 304));

.

.



.

}

Arrays provide random access of a sequential set of data. Dictionaries (or associative arrays) provide a map from a set of keys to a set of values. Most of the time a dictionary-like type is built as a hash table - this type is very useful as it provides very fast lookups on average (depending on the quality of the hashing algorithm). I have found dictionary data structure faster than array list data structure when lookup in symbol table for keywords and identifiers. Array lists just store a set of objects (that can be accessed randomly). Dictionaries store pairs of objects. This makes array/lists more suitable when you have a group of objects in a set (prime numbers, colors, students, etc.). Dictionaries are better suited for showing relationships between a pair of objects. The Dictionary class constructor takes two parameters (generic type), first for the type of the key and second for the type of the value. The following code snippet creates a Dictionary where keys are strings and values are short. 

public static Dictionary a = new Dictionary();

public static void insertKeyword()

{
a.Add("Begin", 300);

a.Add("And", 301);

a.Add("Case", 302);

a.Add("Const", 303);

a.Add("Div", 304);

.

.



.

.

}



So, in dictionary data structure doesn’t have a struct type to represent its objects.


    1. Microsoft Intermediate Language (MSIL) of If Statement Improvement

We can optimize “if statement” MSIL by removing ldc.i4.0 instruction and ceq instruction and replacing brtrue.s with brfalse.s and getting same results that before optimization. We can see this in sample code below.

Sample code:
int a = 0, b = 1, c=2;

if (a == 1)

{

a = b + c;



}
Improvement MSIL Of Code:
IL_0000: nop

IL_0001: ldc.i4.0

IL_0002: stloc.0

IL_0003: ldc.i4.1

IL_0004: stloc.1

IL_0005: ldc.i4.2

IL_0006: stloc.2

IL_0007: ldloc.0

IL_0008: ldc.i4.1


We can Remove these instructions to improve the space and the time so after removing them we have to replace “brtrue.s IL_18” instruction with “ brfalse.s IL_18” instruction
IL_0009: ceq


IL_000b: ldc.i4.0

IL_000c: ceq


IL_000e: stloc.3

IL_000f: ldloc.3


IL_0010: brtrue.s IL_0018
IL_0010: brfalse.s IL_0018
IL_0012: nop

IL_0013: ldloc.1

IL_0014: ldloc.2

IL_0015: add

IL_0016: stloc.0

IL_0017: nop

IL_0018: ret


    1. Evaluations and performance

The section describes the evaluation and performance of the DCSPM Compiler in two stages. First stage is the symbol table of lexical analysis and second stage is the MSIL results. I have tested different types of code such as 11,22,33,44,55,66,77,88, and 99 lines in lexical analysis phase which is using array list data structure and lexical analysis phase which is using dictionary data structure. Table 1 is shows the results of the array list and dictionary.



#Lines

Array List

Dictionary

11

7.7702 ms

6.0066 ms

22

7.8529 ms

6.5299 ms

33

7.9264 ms

6.6787 ms

44

8.0363 ms

6.9415 ms

55

8.4518 ms

7.1428 ms

66

8.4946 ms

7.2742 ms

77

8.6187 ms

7.2959 ms

88

8.9369 ms

7.4568 ms

99

9.2126 ms

7.5075 ms




Table 1: Array list data structure vs. Dictionary data structure

It’s obvious that when lexical analysis is using dictionary data structure is faster than array list data structure such as the chart shown in Figure 14.




Time ms

Lines of Program




Figure 14: Array list data structure vs. Dictionary data structure

“The Dictionary is probably the most used associative container class. The Dictionary is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers. Because the keys are hashed, the key type should correctlyimplement GetHashCode() and Equals() appropriately or you should provide an external IEqualityComparer to the dictionary on construction. The insert/delete/lookup time of items in the dictionary is amortized constant time - O(1) - which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant. This is highly desirable for high-speed lookups. The only downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order.” [23]
These are differences between dictionary and array list what we've learned in a quick reference table. [23]

Collection

Ordering

Contiguous Storage?

Direct Access?

Lookup Efficiency

Manipulate

Efficiency



Notes

Dictionary

Unordered

Yes

Via Key

Key:

O(1)


O(1)

Best for high performance lookups.

ArrayList

User has precise control over element ordering

Yes

Via Index

O(n)

O(n)


Best for smaller lists

“ArrayList  resizes dynamically. As elements are added, it grows in capacity to accommodate them. It is most often used in older C# programs. It stores a collection of elements of type object. This makes casting necessary.” [24]



Second stage is that comparison between the unimproved if/else MSIL results and improved if/else MSIL results.


  1. Lessons Learned




  1. Future Works




  1. Conclusion



  1. Yüklə 1,28 Mb.

    Dostları ilə paylaş:
1   2   3   4




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə