Musigram
An idea not to be taken too seriously.. Also, it's probably not the most novel idea either.
The Concept
In a nutshell, the idea here is basically "musical morse code". Encode data into a musical melody that is human readable: that is to say, a musician could transcribe the melody without exerting too much effort.
Information is broken up into musical beats. A musical beat can up to 3 notes in it with a distinct rhythmic pattern: a combination of Quarter (Q), Eighth (E), and Sixteenth Notes (S), and Dotted Eigths (E.)
We use these to obtain 7 rhythmic patterns:
Q, EE, E.S, SE., ESS, SES, SSE
4 sixteenth notes (SSSS) is omitted. It would end up being too dense, and too annoying.
Each of these notes can be mapped to a pitch in a scale. 1.5 octaves of a pentatonic scale gives 9 notes: do re mi so la, do re mi so
This scal chose because penatonic sounds pretty.
With the 9 pitches and rhythms, you get a fair amount of permutations: 1 one-note rhythm, 3 2-note rhythms, and 3 3-note rhythms:
9 + 3*9^2 + 3*9^3 = 2,439
...or log2(2439) bits of information, or ~11.25 bits of information.
This system therefore allows encoding 11-bits of information a beat, or 5.5 bytes in a bar, or 88 bytes in a 16-bar melody. There's also some unused bits remaining with this approach. If one really wanted to pack the maximum amount of information in 16 bars, you could squeeze in 90 bytes of information in a 16-bar melody that in theory should be easy for musicians to transcribe.
This system could be modified too. More rhythmic patterns such as the "ssss" pattern could be introduced. More pitches could be added: a 1.5 octave major scale could have 12 pitches. Timbre was not considered here, but using things like phonemes could add more resolution to the data. It isn't terribly taxing to add 3 or 4 distinct vowel sounds.
Using the more extreme end of this paradigm, how much digital information could you cram in here?
12 pitches x 4 vowels = 48 pitch-vowel combinations
If we introduce the very dense sixteenth note pattern SSSS, we get this many pitch/rhythm permutations:
48 + 3*48^2 + 3*48^3 + 48^4 = 5,647,152 permutations
In one beat of music, that is ~22 bits of digital information. In 16 bars of 4/4 music, that's 1435 bytes of data, which I think is about the upper limit of what you'd want in a human readable audio format for representing data.
Updates
2021-11-11 10:26:15: some initial work has actually begun already with an initial implementation. the idea is to be able to encode bytes into the musical beat structure. from there, this intermediate format will be converted to (gest) commands.
2021-11-11 10:24:59: yup, I'm still thinking about this. I probably won't forget about it until I get a proof-of-concept implementation.