|
Overview: • About Miller • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Internationalization Using Miller: • Reference • Manpage • FAQ • Cookbook • Data examples • Installation, portability, dependencies, and testing • Documents by release Background: • Why C? • Why call it Miller? • How original is Miller? • Performance Repository: • Things to do • Contact information • GitHub repo |
• DKVP: Key-value pairs • NIDX: Index-numbered (toolkit style) • CSV/TSV/etc. • Tabular JSON • Single-level JSON objects • Nested JSON objects • Formatting JSON options • JSON non-streaming • PPRINT: Pretty-printed tabular • XTAB: Vertical tabular • Markdown tabular Examples
$ mlr --usage-data-format-examples
DKVP: delimited key-value pairs (Miller default format)
+---------------------+
| apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
+---------------------+
NIDX: implicitly numerically indexed (Unix-toolkit style)
+---------------------+
| the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
| fox jumped | Record 2: "1" => "fox", "2" => "jumped"
+---------------------+
CSV/CSV-lite: comma-separated values with separate header line
+---------------------+
| apple,bat,cog |
| 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
| 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+---------------------+
Tabular JSON: nested objects are supported, although arrays within them are not:
+---------------------+
| { |
| "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| "bat": 2, |
| "cog": 3 |
| } |
| { |
| "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
| "egg": 7, |
| "flint": 8 |
| }, |
| "garlic": "" |
| } |
+---------------------+
PPRINT: pretty-printed tabular
+---------------------+
| apple bat cog |
| 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
| 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+---------------------+
XTAB: pretty-printed transposed tabular
+---------------------+
| apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| bat 2 |
| cog 3 |
| |
| dish 7 | Record 2: "dish" => "7", "egg" => "8"
| egg 8 |
+---------------------+
Markdown tabular (supported for output only):
+-----------------------+
| | apple | bat | cog | |
| | --- | --- | --- | |
| | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
| | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+-----------------------+
DKVP: Key-value pairsMiller’s default file format is DKVP, for delimited key-value pairs. Example:$ mlr cat data/small a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}"
puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',')
echo "type=3,user=$USER,date=$date\n";
logger.log("type=3,user=$USER,date=$date\n");
resource=/path/to/file,loadsec=0.45,ok=true record_count=100, resource=/path/to/file resource=/some/other/path,loadsec=0.97,ok=false NIDX: Index-numbered (toolkit style)With --inidx --ifs ' ' --repifs, Miller splits lines on whitespace and assigns integer field names starting with 1. This recapitulates Unix-toolkit behavior. Example with index-numbered output:
CSV/TSV/etc.When mlr is invoked with the --csv or --csvlite option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See Record-heterogeneity for how Miller handles changes of field names within a single data stream. Miller has record separator RS and field separator FS, just as awk does. For TSV, use --fs tab; to convert TSV to CSV, use --ifs tab --ofs comma, etc. (See also Reference.) The following are synonymous pairs:
Tabular JSONJSON is a format which supports arbitrarily deep nesting of “objects” (hashmaps) and “arrays” (lists), while Miller is a tool for handlingSingle-level JSON objectsAn
$ mlr --json head -n 2 data/json-example-1.json
{ "color": "yellow", "shape": "triangle", "flag": 1, "i": 11, "u": 0.6321695890307647, "v": 0.9887207810889004, "w": 0.4364983936735774, "x": 5.7981881667050565 }
{ "color": "red", "shape": "square", "flag": 1, "i": 15, "u": 0.21966833570651523, "v": 0.001257332190235938, "w": 0.7927778364718627, "x": 2.944117399716207 }
$ mlr --json --jvstack head -n 2 then cut -f color,u,v data/json-example-1.json
{
"color": "yellow",
"u": 0.6321695890307647,
"v": 0.9887207810889004
}
{
"color": "red",
"u": 0.21966833570651523,
"v": 0.001257332190235938
}
$ mlr --ijson --opprint stats1 -a mean,stddev,count -f u -g shape data/json-example-1.json shape u_mean u_stddev u_count triangle 0.583995 0.131184 3 square 0.409355 0.365428 4 circle 0.366013 0.209094 3 Nested JSON objectsAdditionally, Miller can
$ mlr --json --jvstack head -n 2 data/json-example-2.json
{
"flag": 1,
"i": 11,
"attributes": {
"color": "yellow",
"shape": "triangle"
},
"values": {
"u": 0.632170,
"v": 0.988721,
"w": 0.436498,
"x": 5.798188
}
}
{
"flag": 1,
"i": 15,
"attributes": {
"color": "red",
"shape": "square"
},
"values": {
"u": 0.219668,
"v": 0.001257,
"w": 0.792778,
"x": 2.944117
}
}
$ mlr --ijson --opprint head -n 4 data/json-example-2.json flag i attributes:color attributes:shape values:u values:v values:w values:x 1 11 yellow triangle 0.632170 0.988721 0.436498 5.798188 1 15 red square 0.219668 0.001257 0.792778 2.944117 1 16 red circle 0.209017 0.290052 0.138103 5.065034 0 48 red square 0.956274 0.746720 0.775542 7.117831
$ mlr --json --jvstack head -n 1 then put '${values:uv} = ${values:u} * ${values:v}' data/json-example-2.json
{
"flag": 1,
"i": 11,
"attributes": {
"color": "yellow",
"shape": "triangle"
},
"values": {
"u": 0.632170,
"v": 0.988721,
"w": 0.436498,
"x": 5.798188,
"uv": 0.625040
}
}
Formatting JSON optionsJSON isn’t a parameterized format, so RS, FS, PS aren’t specifiable. Nonetheless, you can do the following:
JSON non-streamingThe JSON parser does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in tail -f contexts.PPRINT: Pretty-printed tabularMiller’s pretty-print format is like CSV, but column-aligned. For example, compare
XTAB: Vertical tabularThis is perhaps most useful for looking a very wide and/or multi-column data which causes line-wraps on the screen (but see also https://github.com/twosigma/ngrid for an entirely different, very powerful option). Namely:
Markdown tabularMarkdown format looks like this:$ mlr --omd cat data/small | a | b | i | x | y | | --- | --- | --- | --- | --- | | pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 | | eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 | | wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 | | eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 | | wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
As of Miller 4.3.0, markdown format is supported only for output, not input.
|