Humdrum Extras

themax manpage


COMMAND

    themax -- Search melodic index data created by themebuilderx

OPTIONS

    -a Anchor search to start of search field. [more info]
    -m Search only entries in a minor key. [more info]
    -M Search only entries in a major key. [more info]
    -t Search only entries in a particular key tonic. [more info]
    -T Search only entries in particular metrical classes. [more info]

    Standard feature query options:

    -c Search by pitch refined contour. [more info]
    -C Search by pitch gross contour. [more info]
    -d Search by scale degree. [more info]
    -i Search by 12-tone interval string. [more info]
    -I Search by musical interval. [more info]
    -p Search by absolute pitch class. [more info]
    -P Search by 12-tone pitch class. [more info]

    Extended rhythm query options

    -b Search by beat level. [more info]
    -u Search by duration. [more info]
    -e Search by metric refined contour. [more info]
    -E Search by metric gross contour. [more info]
    -r Search by duration refined contour. [more info]
    -R Search by duration gross contour. [more info]
    -l Search by metric position. [more info]
    -L Search by metric level. [more info]

    Other options

    --count Return a count of the number of matches rather than echoing lines which match. [more info]
    -D convert -p search (pitch class) into diatonic pitch class search, where any accidental on a diatonic pitch letter will match. [more info]
    --features Display the component feature regular expression search string and then exit without searching. [more info]
    --raw Do not attempt to clean up search queries from the command-line. [more info]
    --regex Display the final regular expression search string and then exit without searching. [more info]
    --trim Return only the first column of index lines which match query. [more info]
    -v Negate the search query and return entries which do not match. [more info]

DESCRIPTION

    The themax command is the search-engine core of Themefinder. The program reads an index file created by the themebuilderx program. This search index consists of a series of lines, one for each monophonic musical example being searched. Each line in the index file consists of an ordered sequence of musical features extracted from a Humdrum file, with each musical feature description starting with an identifying character, and fields are separated by tab characters.

    The index file is searched via regular expressions by the themax program. The original thema command utilizes the command-line regular expression tool, grep, to do the final searching in the index file. The themax program uses the Perl-Compatible Regular Expression library to internally search the index file.

    As an example application, suppose that you want to search in the melodies of the vocal parts in Ludwig Erk's Deutscher Liederschatz, volume 1 which consists of 201 songs. First you will have to extract the vocal parts from the full score. The vocal part is always in spine three of the score for these particular songs (the first two spines being the piano accompaniment), so the following command can extract the vocal parts using a bash-shell for-loop:

          for i in *.krn
          do
             extractx -f3 $i > `basename $i .krn`.thm
          done
          themebuilderx *.thm > liederschatz1.thema

    The above command will create the following content in the file liederschatz1.thema:

    The first column contains the filename given as input to the themax command (the .thm extensions removed manually for this example). The filename can also be replaced with some other unique identifier for the melodic data described on the line. Subsequent tab-separated fields on the line each start with a unique character followed by feature data. Here is an example of the features extracted from the first song in the set:

    erk053
    The filename (or any arbitrary identification tag). This field will be ignored by themax when searching the index file.
    ZE-=
    The key of the melodic data. The field starts with a capital "Z" for major keys, and a lower-case "z" for minor keys. The field ends with an equals sign (=). In this case the music is in E-flat major. A major would be encoded as "ZA=", and C-sharp minor would be encoded as "zC#=".
    {p1p2p0m1p1p2m2p0m2p0p0p0p0m3p0m3m4p2p2p1p7p0p5m1m2m2m2
     m1m2m2m1m2p0p8p4m2p0p0p2p2p1p2m2m1m2p0m2p0p0m8p1
    The "12-tone interval" search feature which is prefixed by the "{" character. The data field consists of the number of half-steps in a melodic interval, as well as "p" (plus) for rising intervals and "m" (minus) for falling intervals. A repeated note is indicated by "p0".
    #uusduudsdssssDsDDuuuUsUdddddddddsUUdssuuuudddsdssDu
    The "pitch refined contour" search feature which is prefixed by the "#" character. There are five possible characters in this feature: u = unison (repeated note). d for a step-wise (major/minor/augmented second) interval down. u for a step-wise interval up. D for a leap (larger than a second) interval down. And U for a leap up.
    :UUSDUUDSDSSSSDSDDUUUUSUDDDDDDDDDSUUDSSUUUUDDDSDSSDU
    The "pitch gross contour" search feature which is prefixed by the ":" character. There are three possible characters in this feature: U for a unison, D for a falling interval, and U for a rising interval.
    %3455456554444422756715517654321766465556712176655571
    The "scale degree" search feature which is prefixed by the "%" character. The digits 1 through 7 are allowed in this feature field, with 1 indicating the note is on the tonic of the musical key for the data, 2 is the second scale-degree in the key, and so on.
    }Xm2XM2P1xm2Xm2XM2xM2P1xM2P1P1P1P1xm3P1xm3xM3XM2XM2Xm2XP5P1XP4xm2xM2x
          M2xM2xm2xM2xM2xm2xM2P1Xm6XM3xM2P1P1XM2XM2Xm2XM2xM2xm2xM2P1xM2P1P1xm6Xm2
    The "musical interval" search feature which is prefixed by the "}" character. This is the diatonic interval description of melodic intervals in the data. Typically there are three components to each interval: (1) direction, (2) quality, (3) diatonic value. Lower-case "x" indicates a falling interval, Upper-case "X" indicates a rising interval. For the quality there are five possible characters: "A" for augmented intervals, "M" for major, "m" for minor, "d" for diminished, and "P" for perfect. doubly augmented/diminished intervals are represented by repeating "A" or "d" respectively. Perfect unisons "P1" are the only interval which are not given a direction marker (unlike the 12-tone interval features which needs a separator for the interval numbers).
    j78AA9A0AA88888552A023AA320A875320080AAA02353200AAA23
    The "12-tone pitch class" search feature which is prefixed by the "j" character. This is the pitch class in terms of half steps above C. For example, C = 1, D = 2, and so on. The pitch class numbers 10 and 11 are mapped to the characters A and B respectively.
    JG Ab Bb Bb A Bb C Bb Bb Ab Ab Ab Ab Ab F F D Bb C D Eb Bb Bb Eb D C
          Bb Ab G F Eb D C C Ab C Bb Bb Bb C D Eb F Eb D C C Bb Bb Bb D Eb
    The "absolute pitch class" search feature which is prefixed by the "J" character. This is the standard diatonic-based note name for the pitches within an octave. Each pitch name is separated by a space from the previous pitch (unlike most other feature). Natural diatonic pitches are just the diatonic letter in upper case (A-G). Sharps are represented after the diatonic pitch name with "#". Flats are represented after the diatonic pitch name with a lower-case "b". Double sharp/flats are indicated by repeating "#" or "b".
    M4/4quadruplesimple
    The "metric description" field which consists of three components, and prefixed by the character "M": (1) the time signature as two numbers separated by a slash, (2) "triple", "duple" or "irregular" depending on if the number of beats in a measure are divisible by 3, 2 or other, and (3) "simple" or "compound" depending on if the meter is a compound meter (such as 6/8 where the beat is a dotted quarter note), or simple otherwise. No spaces are given between the three descriptive components.
    ~<><======><=====><=><===========><=><><=><===><=><>
    The "duration gross contour" search feature which is prefixed by the "~" character. The three possible characters are: < if the current note is shorter than than the previous note; > if the current note is longer than the previous note; = if the current note has the same duration as the previous note.
    ^[][======><=====><=][===========><=][><=][===><=][]
    The "duration refined contour" search feature which is prefixed by the "^" character. The five possible characters include these two additional states: "[" if the current note is much shorter (more than half the duration) of the previous note. "]" if the current note is much longer than the previous note.
    !8d 16 4d 8 8 8 8 8 8 8 4 8 8 8 8 8 8 4 8 8 2 8 8 8 8 8 8 8 8 8 8 8 8 4
          8 8 4d 8 4 8 8 2 8 8 8 8 4 8 8 4d 8 2
    The "duration" search feature which is prefixed by the "!" character. The durations are represented as rhythm reciprocals (4 for quarter note, 8 for 8th-note, etc. and dotted rhythms indicated by appending one more more "d" characters after the basic rhythmic unit.
    &1010101010110101011011010101010101101011011010110101
    The "beat level" search feature which is prefixed by the "&" character. The two possible characters are: 1 if the note is played on the beat, or 0 if the note is played off the beat. (In this case the 3/8 time signature is being treated as "simple" rather than compound, so the beats are being assigned to the eighth-note level.
    '0 m2 p2 m1 p1 m1 0 m1 p2 m1 0 0 m1 p2 m1 0 m1 p1 0 m1 p2 0 m1 p2 m1 0
          m1 p1 m1 0 m1 p2 m1 0 0 m1 p2 m1 p1 0 m1 p2 p1 m1 0 m1 p2 0 m1 p1 m1 p2
    The "metric level" search feature which is prefixed by the "'" (apostrophe) character. Basically a log-2 rhythmic interval. 0 indicates that the previous duration is the same as that of the current note.
    `WHWHWhwHWhSwHWhwHwwHWwHWhwHWhwHWhSwHWHwwHwWhwHWwHWH
    The "metric refined contour" search feature which is prefixed by the "`" (back-quote) character.
    @WHWHWHWHWHSWHWHWHWWHWWHWHWHW
         HWHWHSWHWHWWHWWHWHWWHWH
    The "metric gross contour" search feature which is prefixed by the "@" (at symbol) character.
    =4 4_3/4 1 2_1/2 3 3_1/2 4 4_1/2 1 1_1/2 2 4 4_1/2 1 1_1/2 2 2_1/2 3 4 4_1/2 1 4 4_1/2 1 1_1/2 2 2_1/2 3 3_1/2 4 4_1/2 1 1_1/2 2 4 4_1/2 1 2_1/2 3 4 4_1/2 1 3 3_1/2 4 4_1/2 1 2 2_1/2 3 4_1/2 1
    The "metric position" search feature which is the beat number within the measure on which a note starts. Off-beats contain an extra fractional value which follows an underscore.

    The themebuilderx program will always place these search features in the given order. This ordering is important if you want to search multiple features in parallel since a single regular expressions is generated to do the search in a single pass.

    Search filters

    Search in major keys only

    If you used the -M option with themax, only music in a major key will be searched. The -M option can also be used without any other search queries, and in that case, the themax program will return a list of the lines in the index file which represent music in a major key.

    For example, to count the number of songs in a major key in the example Deutscher Liederschatz volume, type the following command which should give an answer of 196 songs:

        themax -M liederschatz1.thema | wc -l
        196 

    Search in minor keys only

    Likewise, the -m option is used to selectively search minor key entries in the index file. In the example set of songs, only 5 are in a minor key:
        themax -m liederschatz1.thema | wc -l
        5 

    The themax program echos the entire text line in the index on which a match was found for a search query. The command "wc -l" is used above to count the number of lines being returned by the themax command, which is also a count of the number of songs which match the query. If you want to only see which files contained the match, then try this command:

        themax -m liederschatz1.thema | awk '{print $1}'
        erk018
        erk042
        erk081
        erk090
        erk133

    Filtering music for a particular tonic

    The -t option can be used to only search music with a particular tonic pitch. For example "-t C" will only search music in either C major or C minor. Like -M and -m, the -t option does not have to be used with a particular music query, and will return all lines which match the requested tonic key if no additional search queries are added.

    For key tonics which contain a flat sign, use a minus sign after the diatonic pitch name (such at "-t A-" for A-flat. For sharp signs use a "#" character; however, you should probably enclose the option in single quotes on the command-line so that the sharp sign is not interpreted as a comment marker: "-t 'C#'" for a C-sharp tonic. Alternatively, the themax program will accept forms such as "-t A-flat" for A-flat, and "-t F-sharp" for F-sharp.

    As an example, here are searches for various keys which can be used to get statistics on the frequency of tonics used in the songs:

        themax -t C indexfile | wc -l
        30
        themax -t D-flat indexfile | wc -l
        1
        themax -t D indexfile | wc -l
        15
        themax -t E-flat indexfile | wc -l
        20
        themax -t E indexfile | wc -l
        5
        themax -t F indexfile | wc -l
        33
        themax -t G indexfile | wc -l
        47
        themax -t A-flat indexfile | wc -l
        3
        themax -t A indexfile | wc -l
        22
        themax -t B-flat indexfile | wc -l
        24
        themax -t B indexfile | wc -l
        1 

    In this case, the most commonly used key has a tonic on G. The -t option does not select the major/minor modality. Use the -M or -m options to choose the mode:

        themax -M -t G indexfile | wc -l
        46
        themax -m -t G indexfile | wc -l
        1 

    In this case, there is one piece in G minor, and 46 in G major.

    Of course, a more efficient method of doing this particular analysis would be to process the second data field of the index directly without using themax:

        awk '{print $2}' indexfile | sort | uniq -c
           3 ZA-=
          19 ZA=
          24 ZB-=
          30 ZC=
           1 ZD-=
          15 ZD=
          20 ZE-=
           5 ZE=
          33 ZF=
          46 ZG=
           3 zA=
           1 zB=
           1 zG= 

    Filtering music for a particular meter

    Music in a particular meter or metrical category can be selected by using the -T option. The metrical description of the music contains the time signature (numerator/denominator) followed by the string "duple", "triple", "quadruple" or "irregular", and finally "simple" or "compound". As with the previously described options, the -T option can be used alone without any musical feature search.

    In the example data set, the -T option can be used to identify how many songs are in 4/4:

        themax -T 4/4 indexfile | wc -l
        63

    To search music in any duple meter (2/4 and 6/8, for example), use "duple":

        themax -T duple indexfile | wc -l
        62

    Any meter with more than 4 beats is classified as "irregular" (such as 5/4, but not 9/8 which is considered to be a compound time signature with three beats).

        themax -T irregular indexfile | wc -l
        1

    You can compare the number of songs in a simple meter (such as 2/4, 3/4) to songs in compound meter (such as 6/8, 9/8) with the following commands.

        themax -T simple indexfile | wc -l
        176
        themax -T compound indexfile | wc -l
        25

    So about 1/8 of the songs in the example set are in a compound meter.

    Be careful, because the labeling of meters as compound or simple is done automatically by the themebuilderx program, and it cannot distinguish between 6/8 which is compound (two beats at the dotted quarter duration) or simple (six beats at the eighth note duration), which is an inherent ambiguity built-in to modern time signatures. Also, 3/8 is currently identified as being simple (three beats on the eighth note level) rather than compound (one beat at the dotted quarter note level).

    Multiple components of the metrical description can be searched at the same time. The order must be (1) time signature (2) beat count (3) simple/compound. Here is an example search for music in a triple, simple meters:

        themax -T triplesimple indexfile | wc -l
        74

    Note that -T simpletriple would not return any matches since the ordering of the metric components is incorrect.

    Search anchoring

    By including the -a option when searching with themax matches must start at the beginning of of the data. Possible matches which start at a position other than the start of the music will be ignored. This option requires an actual feature query (unlike the previously described search filtering options). If -a is used without any other arguments, all lines of the search index will be echoed to standard output.

    Here is an example search for the pitch sequence "C D E" in the example index, searching both unanchored (pattern can start anywhere in the data), or anchored (pattern must start at the beginning of the music).

        themax -p "C D E" indexfile | wc -l
        29
        themax -a -p "C D E" indexfile | awk '{print $1}'
        erk131 

    Pitch query options

    absolute pitch class

    The -p option is used to search the music by pitch class. The pitch class name is composed of a diatonic letter A-G, and can be followed by one or more flat (-) or sharp (#) signs. For example, A-- is A double-flat, and C## is C double-sharp. An "X" can be used to represent a double sharp. Accidentals only apply to the note to which they follow, so repeated notes using the same accidental must also repeat the accidental signs. In the classic thema command, a single space between note names is required; however, this is optional in themax, and the diatonic pitch names are case-insensitive.

    Here is an example search for the melodic sequence "D F# A" in the example data index:

        themax -p "D F# A" indexfile | wc -l
        8

    Extended pitch-class query syntax

    Northern European pitch-class names are understood by themax, where "is" is equivalent to a sharp sign, and "es" is equivalent to a flat sign. You will have to be careful about the B/H diatonic pitch name. If an "H" is present in the query, it will be assumed to be equivalent to an English "B" pitch, while "B" will be assumed to be equivalent to an English "B-". If no "H" is present, then "B" is assumed to be equivalent to an English "B" pitch.

        themax -p "d fis a" indexfile | wc -l
        8

    Southern European and fixed-do solfège are also understood in the pitch-class query string: C = ut or do, D = re, E = mi, F = fa, G = sol or so, A = la, B = si or ti. English or German accidental syntax can be applied to these basic diatonic codes. Spaces between pitch names are optional.

        themax -p "re fa# la" indexfile | wc -l
        8
        themax -p "refaisla" indexfile | wc -l
        8

    Wildcards in pitch-class query strings

    Five characters are used to represent special meanings in the pitch class search query:

    .
    To search for an unknown or arbitrary pitch, use the dot to represent any one pitch. Example:
      (G# . B) will match to (G# E  B)
      (G# A# B)
      (G# E- B)
      (G# C-- B)
      etc.
    ?
    To indicate an optional note, append a question mark. Examples:
      (F#? A) will match to (F#  A), or (  A)

      (.? G) will match to (E  G)
      (C    G)
      (F# G)
      etc.
    *
    To indicate any number of optional notes, insert an asterisk. Example:
      (* A) will match to (   A)
      (G  A)
      (B G  A)
      (B F# G  A)
      etc.
    +
    To indicate any number of repeated notes, append a plus sign. Example:
      (G+ E) will match to (G E)
      (G G E)
      (G G G E)
      etc.
    ^
    A caret appended to a note indicates an arbitrary accidental. Examples:
      (E^ G) will match to (E  G)
      (E# G)
      (E- G)
      (Ex G)
      (E-- G)

Diatonic mode for pitch class queries

Using the -D option with a -p search is equivalent to adding a "^" wildcard after each diatonic pitch name. This option allows searching for diatonic pitch sequences with any accidental after the pitch name. This search feature is useful for searching for modal equivalents, and various interpretations of musica ficta.

Count the number of songs which contain the exact pitch sequence "A B C D E":

    themax -p "a b c d e" indexfile  --count
    8

Now count the number of songs which contain the diatonic pitch sequence "A B C D E", allowing for accidentals attached to any of those notes:

    themax -D -p "a b c d e" indexfile  --count
    13

The previous search is equivalent to:

    themax -p "a^ b^ c^ d^ e^" indexfile  --count
    13

scale degree

The -d option is used to search using a scale-degree representation of the music. 1 is assigned to the tonic note, and values from 1-7 represent the 7 scale degrees of the major and minor scales. Accidentals are not considered. For example, in C major a C-sharp is encoded as "1" for this musical feature. Spaces are optional between scale degrees.

An example search by scale degree searching for "1 3 5":

    themax -d 135 indexfile | wc -l
    56
    themax -d "1 3 5" indexfile | wc -l
    56

12-tone pitch class

The -P option is used to search using a twelve-tone pitch class representation of the music. The pitch classes are numbers from 0 (for C) to 11 (for B). for values 10 and higher, the pitch class numbers are mapped to letters of the alphabet. So 10 = Y (representing the pitch class B-flat/A-sharp), and 11 = Z (representing the pitch class B). The multi-digit pitch class names can also be used, provided spaces separate the numbers from adjacent values. If you use "Y" or "Z" for pitch classes, then you cannot also use 10 or 11 as the respective alternate names for the pitch classes.

Searching for the 12-tone pitch class sequence "6 8 Y":

    themax -P 68Y indexfile --trim
    erk035
    erk081 
    themax -P "6 8 10" indexfile --trim
    erk035
    erk081 
    themax -P "68 10" indexfile --trim
    erk035
    erk081 

Note that twelve-tone pitch classes are less selective than diatonic+accidental based pitch classes:

    themax -p "f# g# a#" indexfile --trim
    erk081 
    themax -p "g- a- b-" indexfile --trim
    erk035 

musical interval

The -I option can be used to search by musical interval. A musical interval has three components in this order: (1) interval direction, (2) interval quality, and (3) diatonic interval size.
  • The direction can be either a + (plus) for a rising interval, or a - (minus) for a falling interval.
  • The interval quality can be: "d or D for diminished, m for minor, M for major, P or p for perfect, A or a for augmented.
  • The diatonic interval size is, 1 for unisons, 2 for seconds, 3 for thirds, etc., 10 for tenths (octave plus a third), and so on.

Example interval descriptions: +P5 = a rising perfect fifth. +3 = a rising third (can be a rising major third or a rising minor third). P4 a perfect fourth (either rising or falling). A "+" by itself represent a rising interval of any quality and diatonic size.

Songs which start with a rising perfect fifth:

    themax -a -I "+P5" indexfile --trim
    erk035 

Number of songs which contain a rising perfect fifth anywhere in the song:

    themax -I +P5 indexfile --count
    87

Number of songs which contain a rising or falling octave:

    themax -I P8 indexfile --count
    61

Songs which contain a long up/down sequence of intervals:

    themax -I "+-+-+-+-+-" indexfile --trim
    erk009
    erk016
    erk038
    erk066
    erk103

Songs which contain 3 rising thirds in a row:

    themax -I "+3 +3 +3" indexfile --trim
    erk029
    erk074
    erk099
    erk101

Wildcards for musical interval search queries

Not all of these wildcards are active yet.

    .
    To search for an unknown or arbitrary interval, use the dot to represent any one interval. Example:
      (+M2 . -m3) will match to (+M2 -M2 -m3)
      (+M2 +P4 -m3)
      (+M2 -P5 -m3)
      (+M2  P1 -m3)
      etc.
    ?
    To indicate an optional interval, append a question mark. Example:
      (+M3 +m3? -m3) will match to (+M3 +m3 -m3)
      (+M3     -m3)
      etc.
    *
    To indicate any number of optional intervals, insert an asterisk. Example:
      (+M2 * -m3) will match to (+M2 -M2 P1 +M6 -m3)
      (+M2 +P5 -m3)
      (+M2 -P4 +m6 -m2 -m2 -m3)
      (+M2 +m7 -P8 +M2 -m3)
      etc.
    ^
    A caret appended to an interval indicates an optional inversion of the interval. The inverted interval is in the opposite direction. Examples:
      (+M3 +P4^ -m2) will match to (+M3 +P4 -m2)
      (+M3 -P5 -m2)
      (+P4 +M3^ -P5) will match to (+P4 +M3 -P5)
      (+P4 -m6 -P5)
      (+m2 A4^ +M2) will match to (+m2 +A4 +M2)
      (+m2 -d5 +M2)
      (+m2 -A4 +M2)
      (+m2 +d5 +M2)
      (+m6 -3^ -M2) will match to (+m6 -M3 -M2)
      (+m6 +m6 -M2)
      (+m6 -m3 -M2)
      (+m6 +M6 -M2)
      (+m6 -d3 -M2)
      (+m6 +A6 -M2)
      (+m6 -A3 -M2)
      (+m6 +d6 -M2)
    #
    To search for an interval or its inversion in the same direction, append the number symbol. Examples:
      (+M3 +P4# -m2) will match to (+M3 +P4 -m2)
      (+M3 +P5 -m2)
      (+P4 +M3# -P5) will match to (+P4 +M3 -P5)
      (+P4 +m6 -P5)
      (+m2 A4# +M2) will match to (+m2 +A4 +M2)
      (+m2 +d5 +M2)
      (+m2 -A4 +M2)
      (+m2 -d5 +M2)
      (+m6 -3# -M2) will match to (+m6 -M3 -M2)
      (+m6 -m6 -M2)
      (+m6 -m3 -M2)
      (+m6 -M6 -M2)
      (+m6 -d3 -M2)
      (+m6 -A6 -M2)
      (+m6 -A3 -M2)
      (+m6 -d6 -M2)

12-tone interval

The -i option can be used to search the music using twelve-tone intervals. This is equivalent to counting the number of half-steps between notes in the sequence. Rising intervals may be preceded by a "+" (plus) and falling intervals must be preceded by a "-" (minus). A repeated note is represented by a "0" interval. If an interval is preceded by a plus/minus sign, then spaces between the intervals are optional.

Look for songs which contain three rising whole-tones in a row:

    themax -i "2 2 2" indexfile --trim
    erk052
    erk078
    erk147 

Count the number of songs which have a major sixth interval up, followed by a major third down:

    themax -i "+9 -4" indexfile --count
    19 

Find a song with a long string of repeated notes:

    themax -i "0 0 0 0 0 0 0 0 0" liederschatz1.thema --trim
    erk130 

pitch gross contour

The -C option can be used to search the musical data for gross contour, which is a basic intervallic description of the melodic line split into three categories (1) up (next pitch is higher than current one), (2) down (next pitch is lower than current one), (3) same (next pitch is same as previous one). For up intervals, you can use u, U, or /. For down intervals, you can use d, D, or \. For repeated (same) intervals you can use s, S, or - (dash).

Extended regular expression syntax work in the pitch gross contour search queries. For example S+ means one or more repeated interval (two or more repeated notes in a row).

Count the number of songs which have 6 up intervals followed by 6 down intervals:

    themax -C "uuuuuudddddd" indexfile  --trim
    erk115 

Same search as above, but using extended regular expressions:

    themax -C "u{6}d{6}" indexfile  --trim
    erk115 

Count the number of songs which have a least 4 upward intervals followed by any type of intervals, and eventually followed by 4 or more falling intervals

    themax -C "u{4,}.*d{4,}" indexfile --count
    34

pitch refined contour

The -c option searches using refined interval contour. Refined contour contains five intervallic levels rather than the three of gross contour. In the refined case, up and down intervals are split into two sub-categories (1) step-wise movement, and (2) leap movement.

Any step-wise movement up or down (a half-step, whole-step or augmented second) is represented by a lower case "u" or "d". Any leap up or down (a third or larger) is represented by an upper case "U" or "D". A repeated note is represented as an "s", "S" or "-" as with gross contour features.

Like gross contour, all extended regular expression wildcards are allowed.

Count the number of songs which contain a leap down followed by one more more steps or leaps upwards, followed by a leap down:

    themax -c "D(u|U)+D" indexfile --count
    109

Find songs which contain at least 10 or more successive leaps (any mixture of up and down leaps):

    themax -c "(U|D){10,}" indexfile --trim
    erk025 
    erk047 
    erk124 

Rhythm query options

Duration gross contour

The -R option is used to search with rhythm gross contour features: "S" if the following note has a shorter duration than the current note; "L" if the following note is longer, and "=" if the following note has the same duration as the current note. The search features are case insensitive, and extended regular expressions can be used in this search option.

Search for songs which contain a long string of equal durations (30 or more repeated durations):

    themax -R "={30}" indexfile --trim
    erk056 
    erk063 
    erk110 

Search for songs which contain four notes, each shorter than the previous one:

    themax -R 'sss' indexfile --trim
    erk052  (starting in second measure of second page)
    erk131 
    erk134 
    erk142 
    erk162  

Search for songs which have a long sequence of shorter-longer duration pairs:

    themax -R '(SL){10}' indexfile  --trim
    erk131 
    erk148 

Duration refined contour

The -r option is used to search duration refined contour which is similar to duration gross contour described above, but allows for 5 states, like pitch refined contour:
  • "S" next note is more than 1/2 as short as the current note.
  • "s" next note is 1/2 or less shorter than current note.
  • "=" next note is equal to duration of current note.
  • "l" (lower-case L) next note is twice or less longer than current note.
  • "L" next note is greater than twice as long as the current note.

Find songs with two shorter notes (of the same duration) followed by three longer notes (of the same duration) which are more than twice the duration of the previous two notes:

    themax -r '=L==' indexfile  --trim
    erk009
    erk059
    erk111 (starting in measure 8)
    erk130
    erk168
    erk169
    erk197

Duration

The -u option allows search by duration. Durations are specified by rhythmic values using Humdrum's **recip representation. In other words "4" is a quarter note, "16" is a 16th note, "12" is a triplet-eighth note (in other words there are 12 triplet-eighth notes in a whole note). Note that notated eighth notes in a 6/8 meter are not considered triplet eighth notes but rather, plain eighth notes ("8").

Count the number of songs which contain any 16th notes:

    themax -u 16 indexfile --trim
    121 

Multiple rhythms in the search query must be separate by one or more spaces in order to parse the rhythmic entities properly:

Count the number of songs which contain the rhythmic pattern "4 8 8":

    themax -u "4 8 8" indexfile --trim
    154 

A period (".") or "d" can be used to represent dotted rhythmic values.

Count the number of songs which have the rhythmic sequence of a dotted eighth note followed by a sixteenth note:

    themax -u "8. 16" indexfile --trim
    100 

Count the number of songs which contain dotted rhythms (Note that rests are not examined by themax):

    themax -u "d" indexfile --count
    174 

Beat level

The -b option can be used to search using the beat level musical feature. If a note is to be played on a beat, its feature value is "1". If a note is to be played off of the beat, its feature value is "0". All extended regular expression wildcards are allowed in this search field.

Count songs which start with an anacrusis:

    themax -a -b 0 indexfile --trim
    94 

Search for a sequence of at least 14 notes which alternate on/off of the beat:

    themax -b "10{7}" indexfile --trim
    erk009  (starting in measure 25)
    erk181  (starting in measure 7) 

Metric level

The -L option is used to search metric level features. The metric level is a log2 indication of the metric stress of a note. Beats are assigned the value 0, eighth-note off-beats are assigned -1, sixteenth-notes after beats and eighth-note off-beats are assigned -2, and so on. In 4/4, the first beat of a measure is +2 and the third beat is +1. For non-negative metric levels (i.e., beats), the symbol "B" (or "b") can be used to indicate any beat. Likewise, "S" (or "s") can be used to indicate any sub-beat (metric position which does not fall on a beat). Note that this is similar to the beat level features.

Count the songs which have a downbeat in 4/4 followed by an eighth-note offbeat.

    themax -L "+2 -1" indexfile --count
    30 

Count the number of songs which start with an eight-note upbeat:

    themax -a -L "-1" indexfile --count
    46 

Find any songs which do not contain sub-beats:

    themax -L "S" -v indexfile --trim
    erk039  

Metric gross contour

The -E option is used to search metric gross contours, where there are 3 possible states:
  • "W" (weaker) The next note is at a weaker metrical strength than the current note.
  • "=" (equal) The next note is at the same metrical strength as the current note.
  • "S" (stronger) The next note is at a stronger metrical strength than the current note.

Metric refined contour

The -e option is used to search metric refined contour, where there are 5 metric levels:
  • "W" if the next note is on a metric level which is two or more levels weaker than the current one (example dotted eighth note followed by a sixteenth.
  • "w" if the next note is one metric level weaker than the current note (example: two sixteenth notes on an eighth-note offbeat).
  • "=" next note is on the same metric level as the current notes (example: two whole notes in 4/4).
  • "s" next note is one metric level stronger than current note (example: second to third note in four 16ths)
  • "S" next note is two or more metric levels stronger then the current note (example: eighth note pickup followed by a downbeat).
    themax -e 'SSSS' indexfile  --trim
    erk030 

Metric position

The -l (lower-case L) option is used to search the "metrical position" feature. This is the beat number on which a note attack occurs. Beat values must be separated by one or more spaces.

In 4/4 meters, count the number of songs which contain only notes on the beats followed by a note in the next measure on the downbeat:

    themax -T 4/4 -l "1 2 3 4 1" indexfile  --count
    34 

Off-beats are specified by adding a space and then a fractional offset into the given beat. A dash ("-") can also be used to separate the beat from the fractional offbeat part of the number. For example to search for a dotted eighth followed by a sixteenth note on beat four, you can search using the feature "4 4 3/4" or "4 4-3/4":

    themax -l "4 4 3/4" indexfile --count
    15
    themax -l "4 4-3/4" indexfile --count
    15 

The wildcard "~" is used after a beat to indicate that the note must fall within a given beat, either on the beat or an offbeat after that beat position, but before the next beat.

Count the number of songs in a 2/4 meter which have three notes occurring during the span of beat 2, with the first note occurring on the beat, and two others on subsequent off-beats before a note on beat 3. In other words, the search query will match to an eighth and two sixteenths on beat 2, or to three triplet eighth notes on beat 2:

    themax -T 2/4 -l "2 2~ 2~ 1" indexfile --count
    37 

The wildcard "^" is used to indicate any offbeat within the given beat, not including any note attack occurring at the start of the beat.

Search for songs which have an note on the first beat of a measure, none on the second but two notes on a off-beats of beat two, followed by a on beat three:

    themax -l "1 2^ 2^ 3" indexfile --trim
    erk009
    erk028
    erk058
    erk078
    erk147
    erk148

Other options

Returning a count of matches

By default, themax returns the lines in the index file which the search query matched. Instead, the --count option can be used to return the number of matches found in the index file. This is equivalent to piping the default output of themax through the command "wc -l".
    themax --count "df#a" indexfile
    8
    themax "df#a" indexfile | wc -l
    8

Negated queries

The -v option allows for the search query to be negated. This causes only entries which do not match to be returned.

Count the number of songs which do not contain the pitch C:

    themax -v -p C indexfile --count
    39 

Search for themes which do not contain a descending minor second:

    themax -v -I "-m2" indexfile  --trim
    erk040
    erk042
    erk068
    erk108
    erk159
    erk162
    erk188
    erk191

Display regular expression search query

The --regex option can be used to display the regular expression which will be used to search the index file given the input query options. The actual search will not be done, and the program will exit after the regular expression is printed to standard output. The regular expression uses extended syntax which can be used in the egrep program.
    themax -a -M --regex -d 135 indexfile
    Z[^=]*=.*%135
    themax -a -M -d 135 indexfile --count
    12
    egrep `themax -a -M --regex -d 135` indexfile | wc -l
    12

Prevent cleaning of search queries

By default, the themax command will attempt to automatically clean up the search queries for musical features so that users can input the features in multiple ways. For example, the pitch search using the -p option accepts both C, ut and do for the pitch name C.

If you are using themax under automated conditions, you can add the --raw option to prevent such pre-processing of the search queries. This will save some negligible time, and will allow the use of extended regular expressions directly on the data. Wildcard characters specific to themax (and not to regular expressions) are not available when using the --raw option.

Display post-processed user query by features

A more verbose version of --regex can be viewed with the --features option. This option will display the cleaned version of the user input queries which is useful for debugging complex queries.

In the following example, -M is converted into Z[^=]*=, -p "utdoissies" is converted into (?:C) (?:C#) (?:Bb), and the final regular expression which will be used to search the index file is Z[^=]*=.*J(?:C) (?:C#) (?:Bb).

    themax2 -a -M  -p "utdoissies" indexfile --queries
    Tonic:		Z[^=]*=
    Pitch-class:	(?:C) (?:C#) (?:Bb)
    Final Regular Expression: Z[^=]*=.*J(?:C) (?:C#) (?:Bb)[ 	]

Returning filenames only

The --trim option will remove all data fields in the matching lines except for the first column of data (the filename or identity string). This is equivalent to piping the default output of themax through the command "awk '{print}'".
    themax --trim -p "df#a" indexfile
    erk005
    erk024
    erk047
    erk050
    erk094
    erk104
    erk107
    erk166 
    themax -p "df#a" indexfile | awk '{print $1}'
    erk005
    erk024
    erk047
    erk050
    erk094
    erk104
    erk107
    erk166 

EXAMPLES

SEE ALSO

DOWNLOAD

    The compiled themax program can be downloaded for the following platforms:
    • Linux (i386 processors) statically compiled on 26 Aug 2010.
    • Mac OS X/i386 compiled on 25 May 2010.

    The source code for the program was last modified on 13 Nov 2008. Click here to go to the full source-code download page.