REGEX(3X) UNIX Programmer's Manual REGEX(3X) NAME regex, regcmp - regular expression compile/execute SYNOPSIS char *regcmp(string1[,string2, ...], 0); char *string1, *string2, ...; char *regex(re, subject[,ret0, ...]); char *re, *subject, *ret0, ...; HP-UX COMPATIBILITY Level: HP-UX/STANDARD Origin: System III DESCRIPTION _R_e_g_c_m_p compiles a regular expression and returns a pointer to the compiled form. _M_a_l_l_o_c(3C) is used to create space for the vector. It is the user's responsibility to free unneeded space so allocated. A zero return from _r_e_g_c_m_p indicates an incorrect argument. _R_e_g_e_x executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. _R_e_g_e_x returns zero on failure or a pointer to the next unmatched character on success. A global character pointer __l_o_c_1 points to where the match began. _R_e_g_c_m_p and _r_e_g_e_x were mostly borrowed from the editor, _e_d(1). However, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated mean- ings. []*.^ These symbols retain their current meaning (as in _e_d(1)). $ Matches the end of the string. \n matches the new- line. - Within brackets, the minus means _t_h_r_o_u_g_h. For exam- ple, [a-z] is equivalent to [abcd...xyz]. The - is interpreted literally only if it is the last or first character specified within the brackets. For example, the character class expression []-] matches the characters ] and -. + A regular expression followed by + means _o_n_e _o_r _m_o_r_e _t_i_m_e_s. For example, [0-9]+ is equivalent to [0-9][0-9]*. {m} {m,} {m,u} Integer values enclosed in {} indicate the number of times the preceding regular expression is to be applied. _m is the minimum number and _u is a number, less than 256, which is the maximum. If only _m is present (e.g., {m}), it indicates the exact number of times the regular expression is to be applied. {m,} is analogous to {m,infinity}. The plus (+) and star (*) operations are equivalent to {1,} and {0,} respectively. ( ... )$_n The value of the enclosed regular expression is to be returned. The value will be stored in the (_n+_1)th argument following the subject argument. At present, at most ten enclosed regular expressions are allowed. _R_e_g_e_x makes its assignments uncondi- tionally. ( ... ) Parentheses are used for grouping. An operator, e.g. *, +, {}, can work on a single character or a regular expression enclosed in parenthesis. For example, (a*(cb+)*)$0. By necessity, all the above defined symbols are special. They must, therefore, be escaped to be interpreted literally. EXAMPLES Example 1: char *cursor, *newcursor, *ptr; ... newcursor = regex((ptr=regcmp("^\n",0)), cursor); free(ptr); This example matches a leading new-line in the subject string pointed at by _c_u_r_s_o_r. Example 2: char ret0[9]; char *newcursor, *name; ... name = regcmp("([A-Za-z][A-za-z0-9_]{0,7})$0", 0); newcursor = regex(name, "123Testing321", ret0); This example matches through the string ``Testing3'' and returns the address of the character after the last matched character (_c_u_r_s_o_r+11). The string ``Testing3'' is copied to the character array _r_e_t_0. Example 3: #include "file.i" char *string, *newcursor; ... newcursor = regex(name, string); This example applies a precompiled regular expression in file.i against _s_t_r_i_n_g. This routine is kept in /lib/libPW.a. SEE ALSO ed(1), malloc(3C). BUGS The user program may run out of memory if _r_e_g_c_m_p is called iteratively without freeing the vectors no longer required. The following user-supplied replacement for _m_a_l_l_o_c(3C) re- uses the same vector, saving time and space: /* user's program */ ... malloc(n) { static int rebuf[256]; return &rebuf; }