Skip to content

Conversation

@drowe67
Copy link
Owner

@drowe67 drowe67 commented Jan 17, 2025

Bandwidth/PAPR

Exploring ideas to improve 99% power bandwidth (spectral mask) from RADE V1. Just prototyping with "mixed rate" training and inference, i.e. no pilots or CP, genie phase.

  • Worked out how to put a BPF in training loop (conv1d with training disabled)
  • Take away that phase only (PAPR 0dB) works quite well
  • clip-BPF x 3 produces reasonable 99% power BW, 0dB PAPR, good loss
  • Doc ML EQ training and inference in README.md when we get to final V2 version. Just collect notes here in comments until then

Training:

python3 train.py --cuda-visible-devices 0 --sequence-length 400 --batch-size 512 --epochs 200 --lr 0.003 --lr-decay-factor 0.0001 ~/Downloads/tts_speech_16k_speexdsp.f32 250117_test --bottleneck 3 --h_file h_nc20_train_mpp.f32 --range_EbNo --plot_loss --auxdata --txbpf
Epoch 200 Loss 0.116

Testing:

./inference.sh 250117_test/checkpoints/checkpoint_epoch_200.pth wav/brian_g8sez.wav - --bottleneck 3 --auxdata --write_tx tx_bpf.f32 --write_latent z.f32 --txbpf
          Eb/No   C/No     SNR3k  Rb'    Eq     PAPR
Target..: 100.00  133.01   98.24  2000
Measured: 102.89          101.12       1243.47  0.00
loss: 0.121 BER: 0.000

octave:154> radae_plots; do_plots('z.f32','tx_bpf.f32')
bandwidth (Hz): 1255.813953 power/total_power: 0.990037

Red lines mark 99% power bandwidth:

Screenshot from 2025-01-22 07-13-34

ML EQ

Classical DSP:

python3 ml_eq.py --eq dsp --notrain --EbNodB 4 --phase_offset

MSE loss function:

python3 ml_eq.py --EbNodB 4 --phase_offset --lr 0.001 --epochs 100

Phase loss function:

python3 ml_eq.py --EbNodB 4 --phase_offset --lr 0.001 --epochs 100 --loss_phase

@drowe67
Copy link
Owner Author

drowe67 commented Feb 3, 2025

Frame 2 EQ examples

  1. Ideal (perfect EQ)

    python3 ml_eq.py --frame 2 --notrain --eq bypass --EbNodB 4
    <snip>
    EbNodB:  4.00 n_bits: 240000 n_errors: 3027 BER: 0.013
    
  2. Classical DSP lin:

    python3 ml_eq.py --eq dsp --notrain --EbNodB 4 --phase_offset --frame 2
    <snip>
    EbNodB:  4.00 n_bits: 240000 n_errors: 3921 BER: 0.016
    
  3. ML EQ (using MSE loss function):

    python3 ml_eq.py --frame 2 --lr 0.1 --epochs 100 --EbNodB 4 --phase_offset --n_syms 1000000 --batch_size 128
    <snip>
    EbNodB:  4.00 n_bits: 24000000 n_errors: 437933 BER: 0.018
    

@drowe67
Copy link
Owner Author

drowe67 commented Feb 4, 2025

ML waveform training

  1. Generate 10 hour complex h file:
    Fs=8000; Rs=50; Nc=20; multipath_samples('mpp', Fs, Rs, Nc, 10*60*60, 'h_nc20_train_mpp.c64',"",1);
    
  2. Training:
    python3 train.py --cuda-visible-devices 0 --sequence-length 400 --batch-size 512 --epochs 200 --lr 0.003 --lr-decay-factor 0.0001 ~/Downloads/tts_speech_16k_speexdsp.f32 250204_test --bottleneck 3 --h_file h_nc20_train_mpp.c64 --h_complex --range_EbNo --plot_loss --auxdata
    

@drowe67
Copy link
Owner Author

drowe67 commented Nov 11, 2025

Toolchain for JMV's adasmooth timing est with post proc

./inference.sh 250725/checkpoints/checkpoint_epoch_200.pth wav/all.wav /dev/null --rate_Fs --latent-dim 56 --peak --cp 0.004 --time_offset -16 --correct_time_offset -16 --auxdata --w1_dec 128 --write_rx 250725_rx_awgn.f32
./jmv_ft_tool.sh 250725_rx_awgn.f32 delta_hat.f32
./rx2.sh 250725/checkpoints/checkpoint_epoch_200.pth 251002_mpp_16k_ft 250725_ml_sync 250725_rx_awgn.f32 /dev/null --latent-dim 56 --w1_dec 128 --noframe_sync --read_delta_hat delta_hat.f32
python3 loss.py features_in.f32 features_out_rx2.f32 --plot --clip_start 25

Note --read_delta_hat delta_hat.f32 uses external timing est, so 251002_mpp_16k_ft not being used.

Testing FT est

Expected answer is Ncp=32, Octave has an off by one error:

./inference.sh 250725/checkpoints/checkpoint_epoch_200.pth wav/all.wav /dev/null --rate_Fs --latent-dim 56 --peak --cp 0.004 --time_offset -16 --correct_time_offset -16 --auxdata --w1_dec 128 --write_rx 250725_rx_awgn.f32
./jmv_ft_tool.sh 250725_rx_awgn.f32 delta_hat.f32 --no_bpf
octave:17> delta_hat=load_f32('delta_hat.f32',1)
<snip past initial transient>
33
33
33

Prototyping Signal Det using Rayleigh model

./jmv_ft.sh -5 5
Ry=load_c64('Ry.c64',160); [y,d]=adasmooth(Ry); figure(1); mesh(abs(y)); figure(2); hist(abs(y(:)),100);
T=0.5; e^(-(T^2)/var(y(:)))

Using sig_det and a single IIR filter:

./inference.sh 250725/checkpoints/checkpoint_epoch_200.pth wav/brian_g8sez.wav /dev/null --rate_Fs --latent-dim 56 --peak --cp 0.004 --time_offset -16 --correct_time_offset -16 --auxdata --w1_dec 128 --write_rx 250725_rx_awgn.f32 --prepend_noise 2
python3 autocorr_simple.py 250725_rx_awgn.f32 Ry.c64
Ry=load_c64('Ry.c64',160); [det,sigma_r,Ry_bar,Ts] = sig_det(Ry); figure(1); mesh(abs(Ry_bar)); figure(2); hist(abs(Ry_bar(:)),100); figure(3); plot(max(abs(Ry_bar'))); hold on; plot(Ts); hold off;

Integrating rx2.py

./inference.sh 250725/checkpoints/checkpoint_epoch_200.pth wav/brian_g8sez.wav /dev/null --rate_Fs --latent-dim 56 --peak --cp 0.004 --time_offset -16 --correct_time_offset -16 --auxdata --w1_dec 128 --write_rx 250725_rx_awgn.f32 --prepend_noise 2 --append_noise 2 --freq_offset 5
./rx2.sh 250725/checkpoints/checkpoint_epoch_200.pth 250725_ml_sync 250725_rx_awgn.f32 /dev/null --latent-dim 56 --w1_dec 128 --noframe_sync --write_delta_hat delta_hat.int16 --write_delta_hat_pp delta_hat_pp.int16 --write_sig_det sig_det.int16
octave:152> delta_hat=load_raw('delta_hat.int16'); figure(5); clf; plot(delta_hat); delta_hat_pp = load_raw('delta_hat_pp.int16'); hold on; plot(delta_hat_pp); sig_det=load_raw('sig_det.int16'); plot(sig_det*175); hold off; freq_offset=load_f32('freq_offset.f32',1); figure(6); plot(freq_offset)

WIP streaming

Does streaming odd/even frame sync & output of z_hat, decoder still done in one hit.

 ./inference.sh 250725/checkpoints/checkpoint_epoch_200.pth wav/brian_g8sez.wav /dev/null --rate_Fs --latent-dim 56 --peak --cp 0.004 --time_offset -16 --correct_time_offset -16 --auxdata --w1_dec 128 --write_rx 250725_rx_awgn.f32 --prepend_noise 1 --append_noise 2
./rx2.sh 250725/checkpoints/checkpoint_epoch_200.pth 250725_ml_sync 250725_rx_awgn.f32 /dev/null --latent-dim 56 --w1_dec 128 --write_delta_hat delta_hat.int16 --write_delta_hat_pp delta_hat_pp.int16 --write_sig_det sig_det.int16 --write_state state.int16 --write_freq_offset_smooth freq_offset_smooth.f32 --write_frame_sync frame_sync.f32 --noframe_sync
python3 loss.py features_in.f32 features_out_rx2.f32 --plot --clip_start 25 --clip_end 30

Note --clip_end as extra frames from post-pended noise upsets loss.py alignment (I think)

Initial pass of "four point" manual tests

021d1fb

Loss from inference.py (genie timing and freq) compared to rx2.py (timing and freq estimators):

Channel SNR (dB) inference.py rx2.py
AWGN high 0.083 0.081
AWGN -4.4 0.407 0.401
MPP high 0.101 0.107
MPP -1.4 0.324 0.346

"high" SNR means the default --EbNodB 100, so effectively noise free. The low SNRs are at roughly the minimum possible for speech.

Four spot test points are high/low SNR, high/low SNR MPP. Command line for worst case low SNR MPP, -1.4dB SNR:

./inference.sh 250725/checkpoints/checkpoint_epoch_200.pth wav/all.wav /dev/null --rate_Fs --latent-dim 56 --peak --cp 0.004 --time_offset -16 --correct_time_offset -16 --auxdata --w1_dec 128 --write_rx 250725_rx.f32 --prepend_noise 1 --append_noise 2 --g_file g_mpp.f32 --EbNodB 5
./rx2.sh 250725/checkpoints/checkpoint_epoch_200.pth 250725_ml_sync 250725_rx.f32 /dev/null --latent-dim 56 --w1_dec 128 --write_delta_hat delta_hat.int16 --write_delta_hat_pp delta_hat_pp.int16 --write_sig_det sig_det.int16 --write_state state.int16 --write_freq_offset_smooth freq_offset_smooth.f32 --write_frame_sync frame_sync.f32 --hangover 100
python3 loss.py features_in.f32 features_out.f32 --features_hat2 features_out_rx2.f32 --clip_start 50 --clip_end 300
Loss between features_in.f32 and features_out.f32
  loss: 0.324 start: 50 acq_time:  0.50 s
Loss between features_in.f32 and features_out_rx2.f32
  loss: 0.346 start: 106 acq_time:  1.06 s

Notes:

  1. We manually extend state machine --hangover 100 to 100 symbols (2 seconds), so that we stay in sync. A re-sync causes a break in the sequence of output feature vectors which breaks the loss.py measurement. In practice, a user may not notice anything, as they re-sync would occur during a deep fade. It may be OK just to use a large --hangover for cmd line testing, but a smaller hangover for running in the real world, due to the small frame size and low cost of re-sync compared to RADE V1.
  2. Likewise the --clip_end 300 - this cuts of the garbage features at the end of the feature_out_rx2.f32 that tend to upset the way loss.py syncs between the two feature vectors. If you increase --hangover, good idea to increase --clip_end. Note that there a 4 features for every two OFDM symbols.
  3. Priority is obtaining similar loss measurements with a real world acquisition system/state machine to running the bare bones ML decoder with genie timing and freq estimates. Haven't optimised acquisition time, and no end of over detection (yet) so we will have significant "run on" when transmission stops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants