OPTIONS
DESCRIPTION
The index file is searched via regular expressions by the themax program. The original thema command utilizes the command-line regular expression tool, grep, to do the final searching in the index file. The themax program uses the Perl-Compatible Regular Expression library to internally search the index file. As an example application, suppose that you want to search in the melodies of the vocal parts in Ludwig Erk's Deutscher Liederschatz, volume 1 which consists of 201 songs. First you will have to extract the vocal parts from the full score. The vocal part is always in spine three of the score for these particular songs (the first two spines being the piano accompaniment), so the following command can extract the vocal parts using a bash-shell for-loop:
for i in *.krn
do
extractx -f3 $i > `basename $i .krn`.thm
done
themebuilderx *.thm > liederschatz1.thema
The above command will create the following content in the file liederschatz1.thema:
The first column contains the filename given as input to the themax command (the .thm extensions removed manually for this example). The filename can also be replaced with some other unique identifier for the melodic data described on the line. Subsequent tab-separated fields on the line each start with a unique character followed by feature data. Here is an example of the features extracted from the first song in the set:
The themebuilderx program will always place these search features in the given order. This ordering is important if you want to search multiple features in parallel since a single regular expressions is generated to do the search in a single pass. Search filtersSearch in major keys onlyIf you used the -M option with themax, only music in a major key will be searched. The -M option can also be used without any other search queries, and in that case, the themax program will return a list of the lines in the index file which represent music in a major key. For example, to count the number of songs in a major key in the example Deutscher Liederschatz volume, type the following command which should give an answer of 196 songs:
themax -M liederschatz1.thema | wc -l
196
Search in minor keys onlyLikewise, the -m option is used to selectively search minor key entries in the index file. In the example set of songs, only 5 are in a minor key:
themax -m liederschatz1.thema | wc -l
5
The themax program echos the entire text line in the index on which a match was found for a search query. The command "wc -l" is used above to count the number of lines being returned by the themax command, which is also a count of the number of songs which match the query. If you want to only see which files contained the match, then try this command:
themax -m liederschatz1.thema | awk '{print $1}'
erk018
erk042
erk081
erk090
erk133
Filtering music for a particular tonicThe -t option can be used to only search music with a particular tonic pitch. For example "-t C" will only search music in either C major or C minor. Like -M and -m, the -t option does not have to be used with a particular music query, and will return all lines which match the requested tonic key if no additional search queries are added.For key tonics which contain a flat sign, use a minus sign after the diatonic pitch name (such at "-t A-" for A-flat. For sharp signs use a "#" character; however, you should probably enclose the option in single quotes on the command-line so that the sharp sign is not interpreted as a comment marker: "-t 'C#'" for a C-sharp tonic. Alternatively, the themax program will accept forms such as "-t A-flat" for A-flat, and "-t F-sharp" for F-sharp. As an example, here are searches for various keys which can be used to get statistics on the frequency of tonics used in the songs:
themax -t C indexfile | wc -l
30
themax -t D-flat indexfile | wc -l
1
themax -t D indexfile | wc -l
15
themax -t E-flat indexfile | wc -l
20
themax -t E indexfile | wc -l
5
themax -t F indexfile | wc -l
33
themax -t G indexfile | wc -l
47
themax -t A-flat indexfile | wc -l
3
themax -t A indexfile | wc -l
22
themax -t B-flat indexfile | wc -l
24
themax -t B indexfile | wc -l
1
In this case, the most commonly used key has a tonic on G. The -t option does not select the major/minor modality. Use the -M or -m options to choose the mode:
themax -M -t G indexfile | wc -l
46
themax -m -t G indexfile | wc -l
1
In this case, there is one piece in G minor, and 46 in G major. Of course, a more efficient method of doing this particular analysis would be to process the second data field of the index directly without using themax:
awk '{print $2}' indexfile | sort | uniq -c
3 ZA-=
19 ZA=
24 ZB-=
30 ZC=
1 ZD-=
15 ZD=
20 ZE-=
5 ZE=
33 ZF=
46 ZG=
3 zA=
1 zB=
1 zG=
Filtering music for a particular meterMusic in a particular meter or metrical category can be selected by using the -T option. The metrical description of the music contains the time signature (numerator/denominator) followed by the string "duple", "triple", "quadruple" or "irregular", and finally "simple" or "compound". As with the previously described options, the -T option can be used alone without any musical feature search.In the example data set, the -T option can be used to identify how many songs are in 4/4:
themax -T 4/4 indexfile | wc -l
63
To search music in any duple meter (2/4 and 6/8, for example), use "duple":
themax -T duple indexfile | wc -l
62
Any meter with more than 4 beats is classified as "irregular" (such as 5/4, but not 9/8 which is considered to be a compound time signature with three beats).
themax -T irregular indexfile | wc -l
1
You can compare the number of songs in a simple meter (such as 2/4, 3/4) to songs in compound meter (such as 6/8, 9/8) with the following commands.
themax -T simple indexfile | wc -l
176
themax -T compound indexfile | wc -l
25
So about 1/8 of the songs in the example set are in a compound meter. Be careful, because the labeling of meters as compound or simple is done automatically by the themebuilderx program, and it cannot distinguish between 6/8 which is compound (two beats at the dotted quarter duration) or simple (six beats at the eighth note duration), which is an inherent ambiguity built-in to modern time signatures. Also, 3/8 is currently identified as being simple (three beats on the eighth note level) rather than compound (one beat at the dotted quarter note level). Multiple components of the metrical description can be searched at the same time. The order must be (1) time signature (2) beat count (3) simple/compound. Here is an example search for music in a triple, simple meters:
themax -T triplesimple indexfile | wc -l
74
Note that -T simpletriple would not return any matches since the ordering of the metric components is incorrect. Search anchoringBy including the -a option when searching with themax matches must start at the beginning of of the data. Possible matches which start at a position other than the start of the music will be ignored. This option requires an actual feature query (unlike the previously described search filtering options). If -a is used without any other arguments, all lines of the search index will be echoed to standard output.Here is an example search for the pitch sequence "C D E" in the example index, searching both unanchored (pattern can start anywhere in the data), or anchored (pattern must start at the beginning of the music).
themax -p "C D E" indexfile | wc -l
29
themax -a -p "C D E" indexfile | awk '{print $1}'
erk131
Pitch query optionsabsolute pitch classThe -p option is used to search the music by pitch class. The pitch class name is composed of a diatonic letter A-G, and can be followed by one or more flat (-) or sharp (#) signs. For example, A-- is A double-flat, and C## is C double-sharp. An "X" can be used to represent a double sharp. Accidentals only apply to the note to which they follow, so repeated notes using the same accidental must also repeat the accidental signs. In the classic thema command, a single space between note names is required; however, this is optional in themax, and the diatonic pitch names are case-insensitive.Here is an example search for the melodic sequence "D F# A" in the example data index:
themax -p "D F# A" indexfile | wc -l
8
Extended pitch-class query syntaxNorthern European pitch-class names are understood by themax, where "is" is equivalent to a sharp sign, and "es" is equivalent to a flat sign. You will have to be careful about the B/H diatonic pitch name. If an "H" is present in the query, it will be assumed to be equivalent to an English "B" pitch, while "B" will be assumed to be equivalent to an English "B-". If no "H" is present, then "B" is assumed to be equivalent to an English "B" pitch.
themax -p "d fis a" indexfile | wc -l
8
Southern European and fixed-do solfège are also understood in the pitch-class query string: C = ut or do, D = re, E = mi, F = fa, G = sol or so, A = la, B = si or ti. English or German accidental syntax can be applied to these basic diatonic codes. Spaces between pitch names are optional.
themax -p "re fa# la" indexfile | wc -l
8
themax -p "refaisla" indexfile | wc -l
8
Wildcards in pitch-class query stringsFive characters are used to represent special meanings in the pitch class search query:
Diatonic mode for pitch class queriesUsing the -D option with a -p search is equivalent to adding a "^" wildcard after each diatonic pitch name. This option allows searching for diatonic pitch sequences with any accidental after the pitch name. This search feature is useful for searching for modal equivalents, and various interpretations of musica ficta.Count the number of songs which contain the exact pitch sequence "A B C D E":
themax -p "a b c d e" indexfile --count
8
Now count the number of songs which contain the diatonic pitch sequence "A B C D E", allowing for accidentals attached to any of those notes:
themax -D -p "a b c d e" indexfile --count
13
The previous search is equivalent to:
themax -p "a^ b^ c^ d^ e^" indexfile --count
13
scale degreeThe -d option is used to search using a scale-degree representation of the music. 1 is assigned to the tonic note, and values from 1-7 represent the 7 scale degrees of the major and minor scales. Accidentals are not considered. For example, in C major a C-sharp is encoded as "1" for this musical feature. Spaces are optional between scale degrees.An example search by scale degree searching for "1 3 5":
themax -d 135 indexfile | wc -l
56
themax -d "1 3 5" indexfile | wc -l
56
12-tone pitch classThe -P option is used to search using a twelve-tone pitch class representation of the music. The pitch classes are numbers from 0 (for C) to 11 (for B). for values 10 and higher, the pitch class numbers are mapped to letters of the alphabet. So 10 = Y (representing the pitch class B-flat/A-sharp), and 11 = Z (representing the pitch class B). The multi-digit pitch class names can also be used, provided spaces separate the numbers from adjacent values. If you use "Y" or "Z" for pitch classes, then you cannot also use 10 or 11 as the respective alternate names for the pitch classes.Searching for the 12-tone pitch class sequence "6 8 Y":
themax -P 68Y indexfile --trim
erk035
erk081
themax -P "6 8 10" indexfile --trim
erk035
erk081
themax -P "68 10" indexfile --trim
erk035
erk081
Note that twelve-tone pitch classes are less selective than diatonic+accidental based pitch classes:
themax -p "f# g# a#" indexfile --trim
erk081
themax -p "g- a- b-" indexfile --trim
erk035
musical intervalThe -I option can be used to search by musical interval. A musical interval has three components in this order: (1) interval direction, (2) interval quality, and (3) diatonic interval size.
Example interval descriptions: +P5 = a rising perfect fifth. +3 = a rising third (can be a rising major third or a rising minor third). P4 a perfect fourth (either rising or falling). A "+" by itself represent a rising interval of any quality and diatonic size. Songs which start with a rising perfect fifth:
themax -a -I "+P5" indexfile --trim
erk035
Number of songs which contain a rising perfect fifth anywhere in the song:
themax -I +P5 indexfile --count
87
Number of songs which contain a rising or falling octave:
themax -I P8 indexfile --count
61
Songs which contain a long up/down sequence of intervals:
themax -I "+-+-+-+-+-" indexfile --trim
erk009
erk016
erk038
erk066
erk103
Songs which contain 3 rising thirds in a row:
themax -I "+3 +3 +3" indexfile --trim
erk029
erk074
erk099
erk101
Wildcards for musical interval search queriesNot all of these wildcards are active yet.
12-tone intervalThe -i option can be used to search the music using twelve-tone intervals. This is equivalent to counting the number of half-steps between notes in the sequence. Rising intervals may be preceded by a "+" (plus) and falling intervals must be preceded by a "-" (minus). A repeated note is represented by a "0" interval. If an interval is preceded by a plus/minus sign, then spaces between the intervals are optional.Look for songs which contain three rising whole-tones in a row:
themax -i "2 2 2" indexfile --trim
erk052
erk078
erk147
Count the number of songs which have a major sixth interval up, followed by a major third down:
themax -i "+9 -4" indexfile --count
19
Find a song with a long string of repeated notes:
themax -i "0 0 0 0 0 0 0 0 0" liederschatz1.thema --trim
erk130
pitch gross contourThe -C option can be used to search the musical data for gross contour, which is a basic intervallic description of the melodic line split into three categories (1) up (next pitch is higher than current one), (2) down (next pitch is lower than current one), (3) same (next pitch is same as previous one). For up intervals, you can use u, U, or /. For down intervals, you can use d, D, or \. For repeated (same) intervals you can use s, S, or - (dash). Extended regular expression syntax work in the pitch gross contour search queries. For example S+ means one or more repeated interval (two or more repeated notes in a row). Count the number of songs which have 6 up intervals followed by 6 down intervals:
themax -C "uuuuuudddddd" indexfile --trim
erk115
Same search as above, but using extended regular expressions:
themax -C "u{6}d{6}" indexfile --trim
erk115
Count the number of songs which have a least 4 upward intervals followed by any type of intervals, and eventually followed by 4 or more falling intervals
themax -C "u{4,}.*d{4,}" indexfile --count
34
pitch refined contourThe -c option searches using refined interval contour. Refined contour contains five intervallic levels rather than the three of gross contour. In the refined case, up and down intervals are split into two sub-categories (1) step-wise movement, and (2) leap movement.Any step-wise movement up or down (a half-step, whole-step or augmented second) is represented by a lower case "u" or "d". Any leap up or down (a third or larger) is represented by an upper case "U" or "D". A repeated note is represented as an "s", "S" or "-" as with gross contour features. Like gross contour, all extended regular expression wildcards are allowed. Count the number of songs which contain a leap down followed by one more more steps or leaps upwards, followed by a leap down:
themax -c "D(u|U)+D" indexfile --count
109
Find songs which contain at least 10 or more successive leaps (any mixture of up and down leaps):
themax -c "(U|D){10,}" indexfile --trim
erk025
erk047
erk124
Rhythm query optionsDuration gross contourThe -R option is used to search with rhythm gross contour features: "S" if the following note has a shorter duration than the current note; "L" if the following note is longer, and "=" if the following note has the same duration as the current note. The search features are case insensitive, and extended regular expressions can be used in this search option.Search for songs which contain a long string of equal durations (30 or more repeated durations):
themax -R "={30}" indexfile --trim
erk056
erk063
erk110
Search for songs which contain four notes, each shorter than the previous one:
themax -R 'sss' indexfile --trim
erk052 (starting in second measure of second page)
erk131
erk134
erk142
erk162
Search for songs which have a long sequence of shorter-longer duration pairs:
themax -R '(SL){10}' indexfile --trim
erk131
erk148
Duration refined contourThe -r option is used to search duration refined contour which is similar to duration gross contour described above, but allows for 5 states, like pitch refined contour:
Find songs with two shorter notes (of the same duration) followed by three longer notes (of the same duration) which are more than twice the duration of the previous two notes:
themax -r '=L==' indexfile --trim
erk009
erk059
erk111 (starting in measure 8)
erk130
erk168
erk169
erk197
DurationThe -u option allows search by duration. Durations are specified by rhythmic values using Humdrum's **recip representation. In other words "4" is a quarter note, "16" is a 16th note, "12" is a triplet-eighth note (in other words there are 12 triplet-eighth notes in a whole note). Note that notated eighth notes in a 6/8 meter are not considered triplet eighth notes but rather, plain eighth notes ("8").Count the number of songs which contain any 16th notes:
themax -u 16 indexfile --trim
121
Multiple rhythms in the search query must be separate by one or more spaces in order to parse the rhythmic entities properly: Count the number of songs which contain the rhythmic pattern "4 8 8":
themax -u "4 8 8" indexfile --trim
154
A period (".") or "d" can be used to represent dotted rhythmic values. Count the number of songs which have the rhythmic sequence of a dotted eighth note followed by a sixteenth note:
themax -u "8. 16" indexfile --trim
100
Count the number of songs which contain dotted rhythms (Note that rests are not examined by themax):
themax -u "d" indexfile --count
174
Beat levelThe -b option can be used to search using the beat level musical feature. If a note is to be played on a beat, its feature value is "1". If a note is to be played off of the beat, its feature value is "0". All extended regular expression wildcards are allowed in this search field.Count songs which start with an anacrusis:
themax -a -b 0 indexfile --trim
94
Search for a sequence of at least 14 notes which alternate on/off of the beat:
themax -b "10{7}" indexfile --trim
erk009 (starting in measure 25)
erk181 (starting in measure 7)
Metric levelThe -L option is used to search metric level features. The metric level is a log2 indication of the metric stress of a note. Beats are assigned the value 0, eighth-note off-beats are assigned -1, sixteenth-notes after beats and eighth-note off-beats are assigned -2, and so on. In 4/4, the first beat of a measure is +2 and the third beat is +1. For non-negative metric levels (i.e., beats), the symbol "B" (or "b") can be used to indicate any beat. Likewise, "S" (or "s") can be used to indicate any sub-beat (metric position which does not fall on a beat). Note that this is similar to the beat level features.Count the songs which have a downbeat in 4/4 followed by an eighth-note offbeat.
themax -L "+2 -1" indexfile --count
30
Count the number of songs which start with an eight-note upbeat:
themax -a -L "-1" indexfile --count
46
Find any songs which do not contain sub-beats:
themax -L "S" -v indexfile --trim
erk039
Metric gross contourThe -E option is used to search metric gross contours, where there are 3 possible states:
Metric refined contourThe -e option is used to search metric refined contour, where there are 5 metric levels:
themax -e 'SSSS' indexfile --trim
erk030
Metric positionThe -l (lower-case L) option is used to search the "metrical position" feature. This is the beat number on which a note attack occurs. Beat values must be separated by one or more spaces.In 4/4 meters, count the number of songs which contain only notes on the beats followed by a note in the next measure on the downbeat:
themax -T 4/4 -l "1 2 3 4 1" indexfile --count
34
Off-beats are specified by adding a space and then a fractional offset into the given beat. A dash ("-") can also be used to separate the beat from the fractional offbeat part of the number. For example to search for a dotted eighth followed by a sixteenth note on beat four, you can search using the feature "4 4 3/4" or "4 4-3/4":
themax -l "4 4 3/4" indexfile --count
15
themax -l "4 4-3/4" indexfile --count
15
The wildcard "~" is used after a beat to indicate that the note must fall within a given beat, either on the beat or an offbeat after that beat position, but before the next beat. Count the number of songs in a 2/4 meter which have three notes occurring during the span of beat 2, with the first note occurring on the beat, and two others on subsequent off-beats before a note on beat 3. In other words, the search query will match to an eighth and two sixteenths on beat 2, or to three triplet eighth notes on beat 2:
themax -T 2/4 -l "2 2~ 2~ 1" indexfile --count
37
The wildcard "^" is used to indicate any offbeat within the given beat, not including any note attack occurring at the start of the beat. Search for songs which have an note on the first beat of a measure, none on the second but two notes on a off-beats of beat two, followed by a on beat three:
themax -l "1 2^ 2^ 3" indexfile --trim
erk009
erk028
erk058
erk078
erk147
erk148
Other optionsReturning a count of matchesBy default, themax returns the lines in the index file which the search query matched. Instead, the --count option can be used to return the number of matches found in the index file. This is equivalent to piping the default output of themax through the command "wc -l".
themax --count "df#a" indexfile
8
themax "df#a" indexfile | wc -l
8
Negated queriesThe -v option allows for the search query to be negated. This causes only entries which do not match to be returned.Count the number of songs which do not contain the pitch C:
themax -v -p C indexfile --count
39
Search for themes which do not contain a descending minor second:
themax -v -I "-m2" indexfile --trim
erk040
erk042
erk068
erk108
erk159
erk162
erk188
erk191
Display regular expression search queryThe --regex option can be used to display the regular expression which will be used to search the index file given the input query options. The actual search will not be done, and the program will exit after the regular expression is printed to standard output. The regular expression uses extended syntax which can be used in the egrep program.
themax -a -M --regex -d 135 indexfile
Z[^=]*=.*%135
themax -a -M -d 135 indexfile --count
12
egrep `themax -a -M --regex -d 135` indexfile | wc -l
12
Prevent cleaning of search queriesBy default, the themax command will attempt to automatically clean up the search queries for musical features so that users can input the features in multiple ways. For example, the pitch search using the -p option accepts both C, ut and do for the pitch name C. If you are using themax under automated conditions, you can add the --raw option to prevent such pre-processing of the search queries. This will save some negligible time, and will allow the use of extended regular expressions directly on the data. Wildcard characters specific to themax (and not to regular expressions) are not available when using the --raw option. Display post-processed user query by featuresA more verbose version of --regex can be viewed with the --features option. This option will display the cleaned version of the user input queries which is useful for debugging complex queries.In the following example, -M is converted into Z[^=]*=, -p "utdoissies" is converted into (?:C) (?:C#) (?:Bb), and the final regular expression which will be used to search the index file is Z[^=]*=.*J(?:C) (?:C#) (?:Bb).
themax2 -a -M -p "utdoissies" indexfile --queries
Tonic: Z[^=]*=
Pitch-class: (?:C) (?:C#) (?:Bb)
Final Regular Expression: Z[^=]*=.*J(?:C) (?:C#) (?:Bb)[ ]
Returning filenames onlyThe --trim option will remove all data fields in the matching lines except for the first column of data (the filename or identity string). This is equivalent to piping the default output of themax through the command "awk '{print}'".
themax --trim -p "df#a" indexfile
erk005
erk024
erk047
erk050
erk094
erk104
erk107
erk166
themax -p "df#a" indexfile | awk '{print $1}'
erk005
erk024
erk047
erk050
erk094
erk104
erk107
erk166
EXAMPLES
SEE ALSODOWNLOAD
The source code for the program was last modified on 13 Nov 2008. Click here to go to the full source-code download page. |