Buildargv function using ragel

Fun use of the Ragel State Machine Compiler to create a line parsing function on int argc, char * argv [].


It all started with the fact that the buildargv function was needed to parse the string for subsequent transfer to


int main (int argc, char *argv[]) { body } 

Well, I thought, it cannot be that it was impossible to borrow anywhere, now we find ... And I did not find ...



Well, not that I would not have found it at all, for example, https://github.com/gcc-mirror/gcc/blob/master/libiberty/argv.c (GPLv2 is always good), I immediately take on such obligations was not ready. There is definitely such a function in bash (GPLv3 is even better). zsh? - go find (I found ... - I do not want).


In general, I didn’t find what I wanted, but I didn’t like what I found. Well, in the end I have the right to do it, all the same I make for myself a thirst for entertainment in the process.


I did not want to write this case in a conventional way from the word at all, I was even upset on this ground.


In general, we meet the Ragel State Machine Compiler.


Tools



The project can be found here: JOYFUL CMDLINE PARSER WRITTEN IN RAGEL


Formulation of the problem


At the input we have a string of any kind, the task is to get from the string an array of arguments separated by a space or tab, with:



In general, there are not many conditions. And Ragel is quite suitable for this task.


Explained Implementation


Declare a machine with the name "buildargv" and ask Ragel to place its data at the beginning of the file (5.8.1 Write Data).


 %%{ machine buildargv; write data; }%% 

Next, we declare a lineElement machine, which in turn consists of a union (2.5.1 Union) of two machines: arg and whitespace .


 lineElement = arg >start_arg %end_arg | whitespace; main := blineElements**; 

At the input and output of the arg machine, the actions start_arg and end_arg respectively.


 action start_arg { argv_s = p; } action end_arg { nargv = (char**)realloc((*argv), (argc_ + 1)*sizeof(char*)); (*argv) = nargv; (*argv)[argc_] = strndup(argv_s, p - argv_s); argc_++; } 

Moreover, the start_arg task start_arg save the position of the character at the input, and the end_arg task end_arg add a new element to the argv array, in case of successful exit from the arg machine.


Now let's take a closer look at arg .


 arg = '\''> { fcall squote; } | '"'>{ fcall dquote; } | ( '\\'>{fcall skip;} | ^[ \t"'\\] )+; 

It consists of a union of three machines ' , " and (\ | ^[ \t"'\]) , the latter in turn is a union of \ and ^[ \t"'\] respectively.


When we find the character ' we call squote , ' we call squote , or if the current character is \ call skip , which skips any character following it, and any character is not 0x20 (space), 0x09 (tab), ' , " or \ is considered correct .


It remains to consider a very small part:


 skip := any @{ fret; }; dquote := ( '\\'>{ fcall skip; } | ^[\\] )+ :> ["] @{ fret; } @err(dquote_err); squote := ( '\\'>{ fcall skip; } | ^[\\] )+ :> ['] @{ fret; } @err(squote_err); 

With skip we have already figured out what does ^['\\] also should not cause questions. And here :> this is the Entry-Guarded Concatenation (4.2 Guarded Operators that Encapsulate Priorities) its meaning is that the machine ( '\\'>{ fcall skip; } | ^['\\] )+ completes execution when ["] changes to the initial state.


And finally, in the case of an end-of-line error with open quotes, dquote_err and squote_err to indicate and set the corresponding error code.


 action dquote_err { ret = -1; errsv = BUILDARGV_EDQUOTE; } action squote_err { ret = -1; errsv = BUILDARGV_ESQUOTE; } 

Code generation is carried out by the command:


 ragel -e -L -F0 -o buildargv.c buildargv.rl 

A list of test lines can be found in test_cmdline.c .


Conclusion


The problem is solved.


Was it faster? I doubt it. More clear? If only you are an expert on Ragel.


I do not pretend to absolutism, I will be grateful for constructive comments on the Ragel code.


Material List:


[^ 1]: Adrian Thurston. Ragel State Machine Compiler .



Source: https://habr.com/ru/post/477296/


All Articles