Here we go:
Surprisingly the MappedByteReader was slower than the BufferedReader. In average it took 1.873 seconds to read the large file.
MappedByteReader and StringTokenizer:
import java.io.BufferedReader; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStreamReader; import java.nio.CharBuffer; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; import java.nio.channels.FileChannel.MapMode; import java.nio.charset.Charset; import java.util.StringTokenizer; ... FileInputStream file = null; InputStreamReader ir = null; BufferedReader br = null; final String charsetName = "UTF-8"; try { file = new FileInputStream(Path); ir = new InputStreamReader(file); br = new BufferedReader(ir); FileChannel ch = file.getChannel(); MappedByteBuffer mbb = ch.map(MapMode.READ_ONLY, 0L, ch.size()); String text = null; while (mbb.hasRemaining()) { CharBuffer cb = Charset.forName(charsetName).decode(mbb); text = cb.toString(); strTokenizer = new StringTokenizer(text); while (strTokenizer.hasMoreTokens()) { String nextToken = strTokenizer.nextToken(); // put in here your logic } } file.close(); ir.close(); br.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }
The BufferedReader was the fastest solution. It took about 1.451 seconds.
BufferedReader and StringTokenizer:
FileInputStream file = null; InputStreamReader ir = null; BufferedReader br = null; try { file = new FileInputStream(Path); ir = new InputStreamReader(file); br = new BufferedReader(ir); String line = null; while (((line = br.readLine()) != null)) { strTokenizer = new StringTokenizer(line); while (strTokenizer.hasMoreTokens()) { String nextToken = strTokenizer.nextToken(); // put in here your logic } } file.close(); ir.close(); br.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } }The java.util.Scanner solution was the slowest one. It took about 17.127 seconds.
Scanner:
FileInputStream file = null; try { file = new FileInputStream(Path); scannerFile = new Scanner(file); scannerFile.useLocale(Locale.US); while (scannerFile.hasNext()) { // put in here your logic } file.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if (scannerFile != null) { scannerFile.close(); } } } private static Node getNodeFromLine() { final long item1 = scannerFile.nextLong(); final double item2 = scannerFile.nextDouble(); ... } private static Edge getEdgeFromLine() { final long item1 = scannerFile.nextLong(); final boolean item2 = getBooleanFromInt(scannerFile.nextInt()); ... }
The StringTokenizer solutions shared the same Node and Edge methods:
private static Node getNodeFromLine() { final long item1 = Long.parseLong(strTokenizer.nextToken()); final double item2 = Double.parseDouble(strTokenizer.nextToken()); ... } private static Edge getEdgeFromLine() { final long item1 = Long.parseLong(strTokenizer.nextToken()); final boolean item2 = getBooleanFromString(strTokenizer.nextToken()); ... }
I removed the whole logic just to focus on the differences between the implementations.
You can easily see that in this case the Scanner wasn't fast enough and the MappedByteReader and the BufferedReader solutions were about 17 times faster. I chose the StringTokenizer because it is supposed to be faster than String.split(), but I didn't test it (have a look here http://stackoverflow.com/questions/691184/scanner-vs-stringtokenizer-vs-string-split).
I hope that give your implementation a performance boost!