Getting Started

Japaki helps to perform two tasks:

Parsing: Read the content of a text file and create a java data structure from it filled with the corresponding values.
Formatting: Write the content of a java data structure into a file following given formatting rules.

For parsing, you need

a file to be parsed,
an object where the parsing result is stored,
a syntax description,
a main program.

For formatting, you need

an object to be formatted,
a syntax description,
a filename for the formatting result,
a main program.

Target files

Japaki is a library to parse data structures in text form that are meant to be read by computer software. In principle, Japaki can parse almost all sort of data, but it is reasonable to use it for

simple data structures like comma separated files (csv),
proprietary data structures, where no specialized parsing framework exists.

It is less efficient or not feasible for

parsing natural language texts,
XML.

Structure of the sample files

The tarball contains four samples. The samples only show the parsing case, the formatting is described later in this text.

Input file Data class Syntax definition Example file (Main class) Illustrates,…
names.txt Human.java in code ReflectionExample.java how to parse a csv file without a header and store its content in a list of beans.
names.txt Person.java in code BeanExample.java as the reflection example but with generic beans.
painters.txt People.java people.sdf FileExample.java how to use a syntax definition from a file for parsing a csv-file with a header.
capitals.txt in code in code PropertiesExample.java how to read a properties file and store it a Hashtable, that is the same as the Properties#load method does.

Input file	Data class	Syntax definition	Example file (Main class)	Illustrates,…
names.txt	Human.java	in code	ReflectionExample.java	how to parse a csv file without a header and store its content in a list of beans.
names.txt	Person.java	in code	BeanExample.java	as the reflection example but with generic beans.
painters.txt	People.java	people.sdf	FileExample.java	how to use a syntax definition from a file for parsing a csv-file with a header.
capitals.txt	in code	in code	PropertiesExample.java	how to read a properties file and store it a Hashtable, that is the same as the Properties#load method does.

Examples step by step

csv files without header

A typical case to use the japaki library is a csv file. The sample csv file "names.txt" looks as follows:


Frederic;Chopin;1810;
Giuseppe;Verdi;1813;
Wolfgang Amadeus;Mozart;1756;
Franz;Liszt;1711;

This csv file contains four records (lines) of person data. Of each person three attributes are given:

the first name,
the last name and
the year of birth.

How is this represented in Java? The data of a person are united in a Java bean like this


public class Human{

    private String firstname;
    private String lastname;
    private Integer birthYear;
…

So the task is to store the data from the csv file into objects of type “Human”, because we have got four of them we use a list of type ArrayList<Human>.
Then japaki comes into play; the following steps have to be performed:

Declare the syntax of the input file. One record is declared like this: "<firstname>;<lastname>;<birthYear>;\n", where
1. <firstname> is a tag that tells japaki to store the content from the csv file in the property “firstname”.
2. The semicolon “;” is expected to be in the input file exactly this way.
3. The “\n” denotes a line break. In the example it is part of the source code, therefore it has to be escaped to “\\n”.
Create an instance of parser bench.
Tell the bench, what a human is. This consists of three things
1. A string to refer to this declaration later, “human” in this case.
2. A class definition (Human.class)to tell the bench, where to store the parsing results. The bench already assumes that there is more than one record file so the list can be omitted.
3. The syntax definition from above – also for the record, not for the list.
Start parsing with the following parameters
1. The string declared above (“human”).
2. The name of the input file.
3. An instance of a list of the bean class.

So this sums up to the following code fragment:


ParserBench bench = new ParserBench();
bench.add("human", Human.class, "<firstname>;<lastname>;<birthYear>;\\n");
ArrayList list = new ArrayList();
bench.parse("human", "names.txt", list);

The complete class can be found in ReflectionExample.java. It is called this way because the toolkit uses reflection to store the data in the beans.

The opposite direction – Storing data

Storing data works the same way as parsing with the exception that a write method is used instead of the parse-method. So steps 1 to 3 are the same as with parsing. Then the declared parser “human” can be used to write the data from a list of humans to an output file. As with the parse methods, three parameters are required to write to an output file

The string declared above (“human”).
The name of the output file
An instance of a list of the bean class – the data to be written

Continuing the example, the code is


bench.write("human", "names.txt", list);

Csv files with headers

Csv files typically have a header like this:


Firstname;Lastname;Birthyear;
Frederic;Chopin;1810;
Giuseppe;Verdi;1813;
Wolfgang Amadeus;Mozart;1756;
Franz;Liszt;1711;

If the first line is always the same constant text, it can be parsed as a constant. The input file consists of the header line and a list of humans, therefore the syntax description is

"Firstname;Lastname;Birthyear\n<humans,,human>”"

The input file is no longer a simple list of beans, therefore the list of beans has to be contained in another bean, with a property humans.


public class People{
    private List humans;
…

<humans,,human> is the syntax definition for a list, where

the first parameter is the name of the property, where the result is stored,
the second parameter would be the name of the parser to be used. If it is not specified, it is derived from the type of the property. humans is a list, therefore a list parser is used.
the third parameter is the name of the subparser used to parse elements of the list. It refers to the existing definition from above.

Using syntax definition files

Now we have more than one parser definition. Instead of adding all the syntax definitions in the java program, it is more convenient to store the syntax in an extra file. An entry in the syntax definition file contains the same information as the method call add from the parser bench:

the name,
the element class,
the syntax definition string.

So the method call


bench.add("human", Human.class, "<firstname>;<lastname>;<birthYear>;\\n");

translates to

 
human,Human := <firstname>;<lastname>;<birthYear>;\n

and the complete file looks as follows:


human,Human := <firstname>;<lastname>;<birthYear>;\n
people,People := Firstname;Lastname;Birthyear\n<humans,,human>

This can now be imported in the parser bench by calling the loadSyntax-method


bench.loadSyntax(filename);

Categories – a variable header

Now we look at an example where the header is no longer fixed, but tells something about the file content: the type of persons in the file.
In the file "painters.txt" all the people in the file are painters:


Painters
Otto;Dix;1891;
Claude;Monet;1840;
Andy;Warhol;1928;

How is this handled? First the bean needs a new property for the category:


public class People{
    private String category;
    private List humans;
…

Then the people line in the syntax description is changed


people,People := <category>\n<humans,,human>

and that`s it.

Variants

Optional fields

Very often csv files are not complete, because information is not known or not applicable, like the birth year in the following file:


Frederic;Chopin;1810;
Giuseppe;Verdi;;
Wolfgang Amadeus;Mozart;1756;
Franz;Liszt;;

In this case the corresponding field is made optional by enclosing it in brackets:


human,Human := <firstname>;<lastname>;[<birthYear>];\n

Parameters

Now it is time for a deeper look at the parameters of a single syntax definition expression and how a parser is created from them.

Basic parameters: property and parser name

Property

The first parameter is the property. The standard case of a property is the name of the property in its bean, but sometimes predefined special properties are needed. Three special properties are defined:

dummy: for parsing, if the value does not need to be stored,
key: the key in a key-value pair,
value: the value in a key-value pair.

Parser Name

The parser name can either be

one of the predefined parsers – typically predefined parsers are not referenced by name, but by the class they belong to;
a parser defined before – like “human” above
omitted – then the parser is derived from the type of the property, this requires that exactly one or a default parser for the property type is defined.

At least one parameter, the property or the parser has to be specified. If they are both specified, the property type must match the parser type.
If the property type is a collection or a map, two parameters are needed:

The parser for the elements – this one can only be referenced by name, because Java Generics only works at compile time and so the class of the elements cannot be detected at run time. In case of a map the element parser parses both, the key and the value. The idea behind that, is that a map entry is treated like a bean with two properties.
A parser for the delimiter, for example a comma in case of a comma separated list. If there is no delimiter, it can be omitted. The delimiter is typically a constant, that can be entered directly, like “,”.

Predefined Parsers

This is a list of the predefined parsers

Name Base Type Format parameter

date Date SimpleDateFormat pattern

decimal Number DecimalFormat pattern

duration Duration DurationFormat pattern

string String RegexFormat delimiter as a regular expression

class Class ClassFormat none

message String MessageFormat pattern

multiline List of String none delimiter

Name	Base Type	Format	parameter
date	Date	SimpleDateFormat	pattern
decimal	Number	DecimalFormat	pattern
duration	Duration	DurationFormat	pattern
string	String	RegexFormat	delimiter as a regular expression
class	Class	ClassFormat	none
message	String	MessageFormat	pattern
multiline	List of String	none	delimiter

Blanks

Blanks are not simple in the japaki framework, but with the above knowledge an arbitrary number of blanks that is to be ignored can be implemented as follows:


<dummy,string,”\S”>