CompactTutorial - all about Computers

Posts mit dem Label java werden angezeigt. Alle Posts anzeigen

Dienstag, 11. August 2015

Previsit a datastructure

Recently I had to write a small and relatively simple compiler and came across a problem. For the function prologue the stack pointer should be decreased by 4 * number of local variables. Compilers work with a datastructure so called "Abstract Syntax Tree". The important part is that it was realized with the Visitor design pattern. I was looking for a simple solution to know the number of local variables in the moment I want to decrease the stack pointer. I couldn't think of non-complex code and came across this elegant and simple solution:

Here we have the normal Visitor:

public interface ASTVisitor {

    public void visit(FunctionDefinitionExpression e);
    public void visit(DeclarationStatement s);
    // ...

}

All subtypes implement the accept method for the Visitor:

public class PrimaryExpression extends Expression {

    // ...
 
    @Override
    public void accept(ASTVisitor v) {
        v.visit(this);
    }

}

Here we have the implemention and the main logic:

public class ASTVisitorImpl implements ASTVisitor {

    @Override
    public void visit(FunctionDefinitionExpression e) {

        int n = 0;

        // here we will use our custom PreVisitor
        // and override our desired method
        funcDefinition.accept(new PreVisitor() {
         
            @Override
            public void visit(DeclarationStatement decl) {

                // add here your logic e.g.
                n++;
    
            }
         });

    }
}

That's our simple PreVisitor:

public class PreVisitor implements ASTVisitor {

    @Override
    public void visit(DeclarationStatement declarationStatement) {
    }

    @Override
    public void visit(BinaryExpression binaryExpression) {
        binaryExpression.elem1.accept(this);
        binaryExpression.elem2.accept(this);
    }

    // ...

}

So the solution was simply to create another subtype of the the same ASTVisitor interface and implement there all methods and just the logic to traverse the whole datastructure (Abstract Syntax Tree). At the moment you want to have a specific information without continuing at this point, you call the PreVisitor and override the desired method - et voilà you have an elegant solution!

Sonntag, 26. Juli 2015

Java: How to read files quickly

Recently I had a project in which a graph with nodes and edges had too be build. The biggest file was even to big for Eclipse to open. It contained about 2.970.000 lines. The lines began with 'N' or 'E' and had additional fields like identifier etc. The first approach was really slow and took about 17 seconds to read in the big file. So I was forced to research for faster methods to read in files.
Here we go:

Surprisingly the MappedByteReader was slower than the BufferedReader. In average it took 1.873 seconds to read the large file.

MappedByteReader and StringTokenizer:

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.charset.Charset;
import java.util.StringTokenizer;

...

FileInputStream file = null;
InputStreamReader ir = null;
BufferedReader br = null;
  
final String charsetName = "UTF-8";
  
try {
   
   file = new FileInputStream(Path);
   ir = new InputStreamReader(file);
   br = new BufferedReader(ir);
   FileChannel ch = file.getChannel();
   MappedByteBuffer mbb = ch.map(MapMode.READ_ONLY, 0L, ch.size());
   
   String text = null;
   
   while (mbb.hasRemaining())  {
    
      CharBuffer cb =  Charset.forName(charsetName).decode(mbb);
      text = cb.toString();
      strTokenizer = new StringTokenizer(text);
    
      while (strTokenizer.hasMoreTokens()) {
    
         String nextToken = strTokenizer.nextToken();
         // put in here your logic
      }
    
   }
   
   file.close();
   ir.close();
   br.close();
   
   } catch (FileNotFoundException e) {
      e.printStackTrace();
   } catch (IOException e) {
      e.printStackTrace();
   }
}

The BufferedReader was the fastest solution. It took about 1.451 seconds.

BufferedReader and StringTokenizer:

FileInputStream file = null;
InputStreamReader ir = null;
BufferedReader br = null;
  
try {
   
   file = new FileInputStream(Path);
   ir = new InputStreamReader(file);
   br = new BufferedReader(ir);
   
   String line = null;
   
   while (((line = br.readLine()) != null))  {
    
      strTokenizer = new StringTokenizer(line);
    
      while (strTokenizer.hasMoreTokens()) {
    
         String nextToken = strTokenizer.nextToken();
         // put in here your logic
      }
    
   }
   
   file.close();
   ir.close();
   br.close();
   
   } catch (FileNotFoundException e) {
      e.printStackTrace();
   } catch (IOException e) {
      e.printStackTrace();
   }
}

The java.util.Scanner solution was the slowest one. It took about 17.127 seconds.

Scanner:

FileInputStream file = null;
  
try {
   
    file = new FileInputStream(Path);
    
    scannerFile = new Scanner(file);
    scannerFile.useLocale(Locale.US);
    
    while (scannerFile.hasNext()) {
       // put in here your logic
    }
    
    file.close();
   
    } catch (FileNotFoundException e) {
       e.printStackTrace();
    } catch (IOException e) {
       e.printStackTrace();
    } finally {
       if (scannerFile != null) {
        scannerFile.close();
       }
    }
}

private static Node getNodeFromLine() {
  
   final long item1 = scannerFile.nextLong();
   final double item2 = scannerFile.nextDouble();
   ...
}

private static Edge getEdgeFromLine() {

   final long item1 = scannerFile.nextLong();
   final boolean item2 = getBooleanFromInt(scannerFile.nextInt());
   ...
}

The StringTokenizer solutions shared the same Node and Edge methods:

private static Node getNodeFromLine() {
  
   final long item1 = Long.parseLong(strTokenizer.nextToken());
   final double item2 = Double.parseDouble(strTokenizer.nextToken());
   ...
}

private static Edge getEdgeFromLine() {

   final long item1 = Long.parseLong(strTokenizer.nextToken());
   final boolean item2 = getBooleanFromString(strTokenizer.nextToken());
   ...
}

I removed the whole logic just to focus on the differences between the implementations.

You can easily see that in this case the Scanner wasn't fast enough and the MappedByteReader and the BufferedReader solutions were about 17 times faster. I chose the StringTokenizer because it is supposed to be faster than String.split(), but I didn't test it (have a look here http://stackoverflow.com/questions/691184/scanner-vs-stringtokenizer-vs-string-split).

I hope that give your implementation a performance boost!

Montag, 27. April 2015

HowTo#1 Avoid inflexible several if else statements/ switches

There are sometimes situations when you want to check an object and depending of the object you want a corresponding result, so you have some kind of mapping. Therefore you want to check the input if it is this or that etc. To be more specific let's consider the following situation:

void ProcessSomething(Color color) {

    FlowerType flowerType;

    if (color == Color.Red) {
        flowerType = Red_Rose;
    } else if (color == Color.Blue) {
        flowerType = Blue_Rose;
    } else if (color == Color.Grey) {
        flowerType = Grey_Rose;
    } else if (color == Color.Yellow) {
        flowerType = Yellow_Rose;
    }  

    // Do something
}

Obviously this is a bad style and leads to inflexible code, but what is the alternative? You could use a switch statement, but this wouldn't lead to a more elegant code.
If you have some kind of mapping for example between Enums, you can do the following:

// Java
public static enum Color {
    Red, Blue, Grey, Yellow;

    private static EnumMap map = new EnumMap(Color.class);

    static {
        map.put(Color.Red, FlowerType.Red_Rose);
        map.put(Color.Blue, FlowerType.Blue_Rose);
        map.put(Color.Grey, FlowerType.Grey_Rose);
        map.put(Color.Yellow, FlowerType.Yellow_Rose);
    }

    public static FlowerType getCorrespondigValue(Color value) {
        return map.get(value);
    }
}
// Apparently there is a bug with the syntaxhighlighter
// In line 5 the correct version is "FlowerType"
// And it adds  in the end, probably it has a problem with "<"

In the case of Enums we can use an EnumMap and use a static block to map the corresponding types. As a result we have a neverchanging and flexible static getCorrespondigValue method, which returns reliably the FlowerType. If you don't use Enums you can use instead of EnumMap a HashMap.

Here is an extract from my Pokemon game in C++ Unreal Engine:

// C++
static TMap MapNameEItemEnum;

UENUM(BlueprintType)
enum class EItemEnum : uint8
{
    Potion                 UMETA(DisplayName = "Potion"),
    Pokeball             UMETA(DisplayName = "Pokeball"),
    Superball            UMETA(DisplayName = "Superball"),
    None                UMETA(DisplayName = "None")
};

static void init()
{
    MapNameEItemEnum.Add(FString(TEXT("Potion")), EItemEnum::Potion);
    MapNameEItemEnum.Add(FString(TEXT("Pokeball")), EItemEnum::Pokeball);
    MapNameEItemEnum.Add(FString(TEXT("Superball")), EItemEnum::Superball);
    MapNameEItemEnum.Add(FString(TEXT("None")), EItemEnum::None);
}


EItemEnum GetCorrespondingStringAsEItemEnum(const FString Str)
{
    EItemEnum* Item = MapNameEItemEnum.Find(Str);

    if (Item)
        return *Item;
    else
        return EItemEnum::None;
}
// Again a bug with the syntaxhighlighter in line 2
// "FString, EItemEnum" is the correct version

Of course you can use the same concept in normal C++, if you don't work with Unreal Engine. Note that there are no simple static blocks in C++ like in Java.

CompactTutorial - all about Computers - by l1lykn8