@ -249,6 +249,22 @@ I think this is good news in terms of making me more confident that this simple
In squiggle.c, the boundary between working with sampler functions and arrays of samples is clear. Not so in the original squiggle, which hides this distinction from the user in the interest of accessibility.
### Parallelism
I provide some functions to draw samples in parallel. For "normal" squiggle.c models, where you define one model and then draw samples from it once at the end, they should be fine.
But for more complicated use cases, my recommendation would be to not use parallelism unless you know what you are doing, because of intricacies around setting seeds. Some gotchas and exercises for the reader:
- If you run the `sampler_parallel` function twice, you will get the same result. Why?
- If you run the `sampler_parallel` function on two different inputs, their outputs will be correlated. E.g., if you run two lognormals, indices which have higher samples in one will tend to have higher samples in the other one. Why?
- For a small amount of samples, if you run the `sampler_parallel` function, you will get better spread out random numbers than if you run things serially. Why?
That said, I found adding parallelism to be an interesting an engaging task. Most recently, I even optimized the code to ensure that two threads weren't accessing the same cache line at the same time, and it was very satisfying to see a 30% improvement as a result.
### Using arbitrary cdfs
The last commit that has code to sample from arbitrary cdfs is `8f6919fa2...`. You can access it with `git checkout 8f6919fa2`. I removed them because I wasn't using them, and they didn't really fit with the overall ethos of the project.
### Other gotchas
- Even though the C standard is ambiguous about this, this code assumes that doubles are 64 bit precision (otherwise the xorshift code should be different).
- I think the core interface is not likely to change much, though I've recently changed the interface for parallelism and for getting confidence intervals.
- I am using this code for a few important consulting projects, and I trust myself to operate it correctly.
## squiggle extra functions
## Functions and their usage
### Division between core functions and squiggle_more expansions
### squiggle.c
This library differentiates between core functions, which are pretty tightly scoped, and expansions and convenience functions, which are more meandering. Expansions are in `squiggle_more.c` and `squiggle_more.h`. To use them, take care to link them:
`squiggle.c` should be pretty tightly scoped. Available functions are:
double result = sample_mixture(samplers, weights, n_dists, seed);
return result;
}
```
// In your C source file
### squiggle_more.h
`squiggle_more.c` has expansions and convenience functions, which are more meandering. Expansions are in `squiggle_more.c` and `squiggle_more.h`. To use them, take care to link them:
`squiggle_more.c` has some helper functions to get confidence intervals. They are in `squiggle_more.c` because I'm still mulling over what their shape should be, and because until recently they were pretty limited and sub-optimal. But recently, I've put a bunch of effort into being able to get the confidence interval of an array of samples in O(number of samples), and into making confidence interval functions nicer and more general. So I might promote them to the main `squiggle.c` file.
```C
#define THOUSAND 1000
#define MILLION 1000000
The relevant functions are
/* Parallel sampling */
void sampler_parallel(double (*sampler)(uint64_t* seed), double* results, int n_threads, int n_samples);
```
/* Stats */
double array_get_median(double xs[], int n);
typedef struct ci_t {
double low;
double high;
} ci;
double array_get_median(double xs[], int n);
ci array_get_ci(ci interval, double* xs, int n);
ci array_get_90_ci(double xs[], int n);
void array_print_stats(double xs[], int n);
void array_print_histogram(double* xs, int n_samples, int n_bins);
void array_print_90_ci_histogram(double* xs, int n, int n_bins);
/* Algebra manipulations */
typedef struct normal_params_t {
double mean;
double std;
} normal_params;
normal_params algebra_sum_normals(normal_params a, normal_params b);
typedef struct lognormal_params_t {
double logmean;
double logstd;
} lognormal_params;
lognormal_params algebra_product_lognormals(lognormal_params a, lognormal_params b);
ci convert_lognormal_params_to_ci(lognormal_params y);
```
#### Extra: parallelism
On parallelism in particular, see the warnings and caveats in the [FOLK_WISDOM.md](./FOLK_WISDOM.md) file. That file also has many other nuggets, warnings, trinkets, caveats, pointers I've collected over time.
Here is an example of using parallelism, and then printing some stats and a histogram:
I provide some functions to draw samples in parallel. For "normal" squiggle.c models, where you define one model and then draw samples from it once at the end, they should be fine.
int main()
{
// set randomness seed
uint64_t* seed = malloc(sizeof(uint64_t));
*seed = 1000; // xorshift can't start with 0
But for more complicated use cases, my recommendation would be to not use parallelism unless you know what you are doing, because of intricacies around setting seeds. Some gotchas and exercises for the reader:
- If you run the `sampler_parallel` function twice, you will get the same result. Why?
- If you run the `sampler_parallel` function on two different inputs, their outputs will be correlated. E.g., if you run two lognormals, indices which have higher samples in one will tend to have higher samples in the other one. Why?
- For a small amount of samples, if you run the `sampler_parallel` function, you will get better spread out random numbers than if you run things serially. Why?
printf("\n# Stats\n");
array_print_stats(xs, n_samples);
printf("\n# Histogram\n");
array_print_histogram(xs, n_samples, 23);
That said, I found adding parallelism to be an interesting an engaging task. Most recently, I even optimized the code to ensure that two threads weren't accessing the same cache line at the same time, and it was very satisfying to see a 30% improvement as a result.
free(seed);
}
```
#### Extra: Algebraic manipulations
This produces the following output:
`squiggle_more.c` has some functions to do some simple algebra manipulations: sums of normals and products of lognormals. You can see some example usage [here](examples/more/07_algebra/example.c) and [here](examples/more/08_algebra_and_conversion/example.c).