Stavros' Stuff Latest Posts Latest posts on Stavros' Stuff. en-us Stavros Korokithakis Making an AI-generated sleep podcast <div class="pull-quote">Falling asleep is more fun with an AI in your ear</div><p>When I was a teenager, I had a CD player in my room, and I used to listen to fairy tales to fall asleep. The narrator&#8217;s voice would relax me and I&#8217;d fall asleep quickly. Fast forward to yesterday, I was playing with Google Text-To-Speech for an unrelated project, and had gotten one of their code samples to generate some speech for me. I had also played around with <a href="">OpenAI&#8217;s GPT-3</a>, which I had found wonderfully surrealist, and it had stuck in my mind, so I thought I should combine the two and create a podcast of nonsensical stories that you could listen to to help you fall asleep more easily.</p> <p>Having already played with Google&#8217;s speech synthesis, I thought it would be pretty quick and easy to create this, as all I&#8217;d have to do is generate some text with <span data-expounder="gpt3">GPT-3</span> and have Google speak it. <span data-expounded="gpt3">GPT-3 is an AI model that can generate very convincing text from a sample. You basically give it a topic and a few sentences, and it continues in the same vein, writing very natural-sounding prose.</span> Half an hour later, I had an AI-generated logo, AI-generated soundscapy background music, an AI-generated fairytale, and an AI-narrated audio file. A day later, I have seven:</p> <h2><a href="">The Deep Dreams podcast.</a></h2> <p>Here&#8217;s how I did it:</p> <!-- break --> <h2>A brief history</h2> <p>I started playing with GPT-3 and some code yesterday afternoon. By today, I had generated a few episodes, and I posted a link to <a href="">Hacker news</a>, where people seemed to like the project and had many good suggestions, some of which I implemented. In this writeup, I will detail what I tried, in separate sections, and mention what I tried initially and what I tried later, after I got some feedback, so the writeup will not be in a strictly chronological order.</p> <p>Let&#8217;s dive in!</p> <h2>GPT-3</h2> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="robot-hand.jpg" data-lightbox="gallery"><img src="robot-hand-small.jpg"></a><a class="photocredit" href="" title="Photo from" target="_blank" rel="noopener noreferrer"><i class="fa fa-camera"></i></a></div><span class="caption">Not pictured: The robot hand that is holding the pen.</span></div><p>First, I started with GPT-3. This part was pretty straightforward, GPT-3 includes a &#8220;playground&#8221; where you can type your prompt, select a few parameters, and have the model try to complete a few sentences at a time. Immediately, the results were pretty usable.</p> <p>I started with a prompt that looked similar to this:</p> <p><em>The storyteller started telling the children an old fairytale, to help them fall asleep. The fairytale was very calming, pleasant and soothing, and it was about a princess, her fairy godmother, and her evil stepmother. It went like this:</em></p> <p><em>Once upon a time,</em></p> <p>The model immediately continued the story with what is now the contents of <a href="">episode 1 of the podcast</a>, sentence by sentence. One small issue that I encountered was that the model would often repeat itself a lot. Luckily, there&#8217;s a parameter that you can tweak to penalize repetition, so the model is more likely to come up with more novel stuff rather than just repeat the same sentences over and over.</p> <p>You can also help the model along if it sticks to the same topic too much, by slightly changing some of the text in the middle of generation. For example, if the model insists on having the main character go back to the forest and chop wood every paragraph, you can change &#8220;and he went back to the woods&#8221; to &#8220;and he went to the city&#8221;, and the model will take that and run with it. It might still have the character go to the forest near the city, but at least it&#8217;s something.</p> <p>Another issue that I still haven&#8217;t solved is the model&#8217;s tendency to stop early. It seems that, sometimes, it just runs out of things to say, and then it starts adding &#8220;and they lived happily ever after&#8221; to the text, or changing the subject completely, and it&#8217;s hard to get it to write more if it doesn&#8217;t want to. That&#8217;s also the main reason why the episodes are less than ten minutes long.</p> <h3>A caveat</h3> <p>I want to point out here a huge caveat of GPT-3 that you need to be aware of, and that I wasn&#8217;t, and it caused my costs to be many times larger than they could have been: GPT-3 charges you both for the generated text <strong>and</strong> for the prompt! That means that if you use the playground in the default configuration, as I did, you might end up writing a prompt of 10 words, then getting another 10 generated, then another 10, then another 10. However, in that case, GPT-3 will charge you for 100 words total, even though you only generated 30. That&#8217;s because, when you press &#8220;continue&#8221;, it considers the previously-generated sentence a part of the prompt now, and charges you for it again, and again, and again, every time you generate another sentence, so you end up paying for each generated sentence N^2 times.</p> <p>A better way to do that would be to have the API generate hundreds of words, and then use some of those as the prompt for the next generation.</p> <h2>Speech synthesis</h2> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="headphones.jpg" data-lightbox="gallery"><img src="headphones-small.jpg"></a><a class="photocredit" href="" title="Photo from" target="_blank" rel="noopener noreferrer"><i class="fa fa-camera"></i></a></div><span class="caption">Tara was somehow comforted by the familiar voice that ordered her to kill all humans.</span></div><p>The next step after generating the text was to get it narrated. Luckily, I had some fresh experience with using Google&#8217;s speech synthesis API, which has <a href="">an array of fairly convincing voices</a>.</p> <p>I wrote some code to generate an MP3 file from the text that GPT-3 wrote, and I had my first narration. Unfortunately, Google&#8217;s API will only allow you to send up to 5000 characters to narrate at a time, so there were some issues with the longer stories, but with the power of Python and some handy libraries, I had the API narrate parts of the story and then stitched them up into a complete whole.</p> <p>This is how episodes one to five are generated, but the voices that Google offered weren&#8217;t a great fit for this use case, and people remarked on how grating it was to hear this particular voice, which is something you definitely don&#8217;t want if you&#8217;re trying to fall asleep. After looking at <a href="">Amazon Polly</a> and Microsoft&#8217;s <a href="">Azure text-to-speech</a>, I decided that the latter was much more pleasant-sounding and had a few voices that were much better than Google&#8217;s.</p> <p>A few minutes later, I had my generation code using the new service, and episodes six and seven sound much more pleasing. I&#8217;m not sure whether this API still has the 5000 character limitation, but the splitting works so well that I didn&#8217;t bother to find out whether it can generate the entire episode in one go.</p> <p>The code that generates audio from <span data-expounder="ssml">SSML</span> <span data-expounded="ssml">(SSML is a markup language that adds annotations to text so that the text-to-speech engine knows how to pronounce certain things)</span> is pretty straightforward, and basically comes straight from the Azure docs:</p> <div class="highlight"><pre><span></span><span class="n">speech_config</span> <span class="o">=</span> <span class="n">SpeechConfig</span><span class="p">(</span> <span class="n">subscription</span><span class="o">=</span><span class="n">mykey</span><span class="p">,</span> <span class="n">region</span><span class="o">=</span><span class="s2">&quot;westeurope&quot;</span><span class="p">,</span> <span class="p">)</span> <span class="n">speech_config</span><span class="o">.</span><span class="n">set_speech_synthesis_output_format</span><span class="p">(</span> <span class="n">SpeechSynthesisOutputFormat</span><span class="p">[</span><span class="s2">&quot;Audio48Khz192KBitRateMonoMp3&quot;</span><span class="p">]</span> <span class="p">)</span> <span class="n">synthesizer</span> <span class="o">=</span> <span class="n">SpeechSynthesizer</span><span class="p">(</span><span class="n">speech_config</span><span class="o">=</span><span class="n">speech_config</span><span class="p">,</span> <span class="n">audio_config</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span> <span class="n">result</span> <span class="o">=</span> <span class="n">synthesizer</span><span class="o">.</span><span class="n">speak_ssml_async</span><span class="p">(</span><span class="n">ssml_text</span><span class="p">)</span><span class="o">.</span><span class="n">get</span><span class="p">()</span> <span class="n">stream</span> <span class="o">=</span> <span class="n">AudioDataStream</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="n">stream</span><span class="o">.</span><span class="n">save_to_wav_file</span><span class="p">(</span><span class="n">outfile</span><span class="p">)</span> </pre></div> <p>This will take the script and write the narrated audio to an MP3 file.</p> <h2>Background music</h2> <p>If you&#8217;ll notice, however, you&#8217;ll see that there&#8217;s more to an episode than just the narration. Episodes also include strategic pauses, background music, and fades.</p> <p>For the background music, I wanted something that&#8217;s algorithmically-generated, to fit with the general theme. <del>I found a site which generates soundscapy audio, and it has a paid export function, which I used to buy a ten-minute-long MP3 of the sounds.</del> (UPDATE: It turns out that the audio files I bought from that site are copyrighted, so I have to take down all the episodes and recreate them again with the new background music.) I imported the audio into <a href="">Audacity</a>, added the narration and background tracks, added some silences and fades, and the first episode was ready!</p> <p>This way of doing things was fine for one episode, but if I was going to make a second one I didn&#8217;t want to have to manually mix tracks again. I looked around for a Python library that could do it for me, and I found <a href="">pydub</a>.</p> <p>pydub is an audio manipulation library, it allows you to easily manipulate volume, generate silence, add fades, and cut/join tracks, which was everything I needed. I wrote a few lines of code to shorten the background track to the duration of the episode, to add pauses before and after the episode ends, and to fade the background track in and out, and another part of the episode generation was done.</p> <p>Here&#8217;s some of the code:</p> <div class="highlight"><pre><span></span><span class="n">aepisode</span> <span class="o">=</span> <span class="n">pydub</span><span class="o">.</span><span class="n">AudioSegment</span><span class="o">.</span><span class="n">from_mp3</span><span class="p">(</span><span class="n">episode</span><span class="p">)</span> <span class="n">abackground</span> <span class="o">=</span> <span class="n">pydub</span><span class="o">.</span><span class="n">AudioSegment</span><span class="o">.</span><span class="n">from_mp3</span><span class="p">(</span><span class="n">background</span><span class="p">)</span> <span class="c1"># Add silence to the start and end.</span> <span class="n">apadded_episode</span> <span class="o">=</span> <span class="p">(</span> <span class="n">pydub</span><span class="o">.</span><span class="n">AudioSegment</span><span class="o">.</span><span class="n">silent</span><span class="p">(</span><span class="n">duration</span><span class="o">=</span><span class="mi">7000</span><span class="p">)</span> <span class="o">+</span> <span class="n">aepisode</span> <span class="o">+</span> <span class="n">pydub</span><span class="o">.</span><span class="n">AudioSegment</span><span class="o">.</span><span class="n">silent</span><span class="p">(</span><span class="n">duration</span><span class="o">=</span><span class="mi">8000</span><span class="p">)</span> <span class="p">)</span> <span class="n">apadded_episode</span><span class="o">.</span><span class="n">export</span><span class="p">(</span><span class="n">tempepisode</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s2">&quot;mp3&quot;</span><span class="p">)</span> <span class="c1"># Cut the background track to the length of the narration.</span> <span class="n">acut_bg</span> <span class="o">=</span> <span class="n">abackground</span><span class="p">[:</span> <span class="n">apadded_episode</span><span class="o">.</span><span class="n">duration_seconds</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">]</span><span class="o">.</span><span class="n">fade_out</span><span class="p">(</span><span class="mi">5000</span><span class="p">)</span> <span class="c1"># Lower the background track volume.</span> <span class="n">alower_volume_cut_bg</span> <span class="o">=</span> <span class="n">acut_bg</span> <span class="o">-</span> <span class="mi">20</span> <span class="c1"># Export a temporary background track.</span> <span class="n">alower_volume_cut_bg</span><span class="o">.</span><span class="n">export</span><span class="p">(</span><span class="n">tempbg</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s2">&quot;mp3&quot;</span><span class="p">)</span> </pre></div> <p>After this, another step uses ffmpeg to mix the two tracks.</p> <p>At that point, I could run the script and have it generate the entire audio file, start to finish, from the script, without any other manual work.</p> <h2>Logo</h2> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="logo.jpg" data-lightbox="gallery"><img src="logo-small.jpg"></a></div><span class="caption">The Deep Dreams podcast logo.</span></div><p>Since everything else fit with the theme, I wanted the logo to be AI-generated too. I found a page online that would let you generate images using an AI, and quickly created the logo there. Unfortunately, I don&#8217;t remember the name of the site, but it doesn&#8217;t matter much anyway.</p> <p>I&#8217;m quite happy with the logo (seen here on the right), it&#8217;s vague enough to be perfect for the podcast, and it&#8217;s AI-generated, which is very fitting with this whole effort.</p> <h2>Costs</h2> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="panhandler.jpg" data-lightbox="gallery"><img src="panhandler-small.jpg"></a><a class="photocredit" href="" title="Photo from" target="_blank" rel="noopener noreferrer"><i class="fa fa-camera"></i></a></div><span class="caption">A human child donates money to a panhandling robot.</span></div><p>One point I was curious about in this whole process was what the costs would be. The paid services I use are GPT-3 for the text generation, and Azure Text-to-Speech. GPT-3 cost about $2 for all seven episodes, and Azure Text-to-Speech cost little enough that I don&#8217;t think I can estimate yet (Microsoft has a $200 free tier for three months). I think it&#8217;s a few cents per hour, though, so it&#8217;s very cheap for what I&#8217;m doing.</p> <p>All in all, I estimate that, to produce these episodes, I spent around $2 in services and $1500 in time, which is basically par for the course for side-projects. Obviously, you can&#8217;t really price the time I spent like that, because I enjoyed doing this, but this way of thinking about it helps me remember that $2 (or whatever I spend out of pocket on my projects) is nothing compared to the enjoyment I get from building these things.</p> <h2>Epilogue</h2> <p>That&#8217;s more or less the entire process I followed in building this. The first episode took half an hour to an hour to make, the second episode took two or three hours (because that&#8217;s when I wrote the autogeneration code), and the rest of the episodes took a few minutes each.</p> <p>If you want to see how the sausage is made, all my code and assorted things are here:</p> <p><a href=""></a></p> <p>Also, if you have any feedback, comments, episode requests, or whatever, feel free to <a href="">Tweet</a> or <a href="">toot</a> at me, or email me directly.</p> Fri, 03 Dec 2021 18:41:44 +0000 How to write a modern Slack bot in Python <div class="pull-quote">It took me SO LONG to find this info</div><p>This post is going to be short, but hopefully will help you avoid the troubles that befell me. I wanted to make a Slack bot using Python. &#8220;How hard can it be?&#8221;, I thought. &#8220;I&#8217;ve done it many times before&#8221;, I thought.</p> <p>Think again.</p> <p>The problem is that Slack has changed the way their APIs work. The old way is now referred to as a &#8220;classic app&#8221; with a &#8220;bot scope&#8221;, and that way is deprecated and you can&#8217;t really create apps like that now, so you have to do a whole other thing.</p> <p>In this post, I will detail the steps necessary to create a simple bot that will listen for messages and reply to them. That&#8217;s all the scaffolding you&#8217;ll need (or that <em>I</em> needed) to create your apps, but I had to search for many hours to discover this information. Hopefully Google will be kinder to you and <!-- break -->point you to this post quickly.</p> <h2>The steps</h2> <p>The first step is to forget all about <code>slack-sdk</code>, a bunch of other libraries and whatnot. What you need is <a href="">slack-bolt</a>, that&#8217;s what will let you make a bot quickly and easily, and is the Official Way™️ to create Slack bots, so <code>pip install slack-bolt</code>. The <a href="">getting started</a> guide is a good (if a bit verbose) way to get the gist of what to do.</p> <p>After you&#8217;ve installed bolt, you need to create an app, turn on &#8220;Enable Events&#8221; under &#8220;Event Subscriptions&#8221;, and add the <code>app_mentions:read</code>, <code>chat:write</code>, and <code>im:history</code> scopes, so the bot can view messages that mention it and send messages of its own. You should also click on &#8220;Subscribe to bot events&#8221; and add <code>app_mention</code>, <code></code>, and <code>message.mpim</code>, to let your bot access all the DMs it&#8217;s in and mentions of it. Furthermore, you should enable &#8220;Socket Mode&#8221; under &#8220;Socket Mode&#8221;, which will let you skip the webhook setup and will let your bot connect to Slack from behind a firewall or without having a hostname pointed to it. The &#8220;getting started&#8221; link above does a good job detailing these as well.</p> <p>Then, you need to install the bot to your workspace by clicking &#8220;Install App to Workspace&#8221; in &#8220;OAuth &amp; Permissions&#8221;, and copy the token you get, which you&#8217;ll need to use in the code below.</p> <h2>Sample code</h2> <p>After you&#8217;ve done the above, it&#8217;s time to listen for and reply to messages.</p> <p>Here&#8217;s some code that uses Socket Mode to connect to Slack, for simplicity. You need to define the two environment variables shown in the code for it to run:</p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">re</span> <span class="kn">from</span> <span class="nn">slack_bolt</span> <span class="kn">import</span> <span class="n">App</span> <span class="kn">from</span> <span class="nn">slack_bolt.adapter.socket_mode</span> <span class="kn">import</span> <span class="n">SocketModeHandler</span> <span class="n">app</span> <span class="o">=</span> <span class="n">App</span><span class="p">(</span><span class="n">token</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">&quot;SLACK_BOT_TOKEN&quot;</span><span class="p">])</span> <span class="nd">@app</span><span class="o">.</span><span class="n">message</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">&quot;(hello|hi)&quot;</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">I</span><span class="p">))</span> <span class="k">def</span> <span class="nf">say_hello_regex</span><span class="p">(</span><span class="n">say</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span> <span class="n">greeting</span> <span class="o">=</span> <span class="n">context</span><span class="p">[</span><span class="s2">&quot;matches&quot;</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="n">say</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">greeting</span><span class="si">}</span><span class="s2">, &lt;@</span><span class="si">{</span><span class="n">context</span><span class="p">[</span><span class="s1">&#39;user_id&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2">&gt;, how are you?&quot;</span><span class="p">)</span> <span class="nd">@app</span><span class="o">.</span><span class="n">message</span><span class="p">(</span><span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">))</span> <span class="k">def</span> <span class="nf">catch_all</span><span class="p">(</span><span class="n">say</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span> <span class="sd">&quot;&quot;&quot;A catch-all message.&quot;&quot;&quot;</span> <span class="n">say</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;I didn&#39;t get that, &lt;@</span><span class="si">{</span><span class="n">context</span><span class="p">[</span><span class="s1">&#39;user_id&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2">&gt;.&quot;</span><span class="p">)</span> <span class="nd">@app</span><span class="o">.</span><span class="n">event</span><span class="p">(</span><span class="s2">&quot;app_mention&quot;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">handle_app_mention_events</span><span class="p">(</span><span class="n">body</span><span class="p">,</span> <span class="n">client</span><span class="p">,</span> <span class="n">say</span><span class="p">):</span> <span class="c1"># Reply to mentions in a thread.</span> <span class="n">client</span><span class="o">.</span><span class="n">chat_postMessage</span><span class="p">(</span> <span class="n">channel</span><span class="o">=</span><span class="n">body</span><span class="p">[</span><span class="s2">&quot;event&quot;</span><span class="p">][</span><span class="s2">&quot;channel&quot;</span><span class="p">],</span> <span class="n">thread_ts</span><span class="o">=</span><span class="n">body</span><span class="p">[</span><span class="s2">&quot;event&quot;</span><span class="p">][</span><span class="s2">&quot;thread_ts&quot;</span><span class="p">],</span> <span class="n">text</span><span class="o">=</span><span class="sa">f</span><span class="s2">&quot;Yes &lt;@</span><span class="si">{</span><span class="n">body</span><span class="p">[</span><span class="s1">&#39;event&#39;</span><span class="p">][</span><span class="s1">&#39;user&#39;</span><span class="p">]</span><span class="si">}</span><span class="s2">&gt;.&quot;</span><span class="p">,</span> <span class="p">)</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span> <span class="n">handler</span> <span class="o">=</span> <span class="n">SocketModeHandler</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">&quot;SLACK_APP_TOKEN&quot;</span><span class="p">])</span> <span class="n">handler</span><span class="o">.</span><span class="n">start</span><span class="p">()</span> </pre></div> <p>Don&#8217;t forget to change to Webhooks mode for production (that is left as an exercise for the reader, as I didn&#8217;t need to do it so I don&#8217;t really have a good example).</p> <h2>Epilogue</h2> <p>That&#8217;s it! I hope I helped. The biggest time-waster for me is that I didn&#8217;t realize that <code>slack-bolt</code> was the official way to write Slack bots nowadays.</p> <p>Many thanks to my friend <a href="">Alex</a> for pointing me in the right direction with this.</p> <p>If you have any questions or comments, please <a href="">Tweet</a> or <a href="">toot</a> at me, or email me directly.</p> Thu, 25 Nov 2021 00:20:13 +0000 How to ask for help <div class="pull-quote">It's harder than it sounds</div><p>As you may be aware, I very much like building things. Almost by definition, this means that I&#8217;m very often in situations where I&#8217;m out of my depth, as I always try to do new things that I don&#8217;t quite know how to do yet. Luckily, I have a whole bunch of knowledgeable friends whom I can ask for help.</p> <p>However, I noticed a pervasive problem when asking for someone&#8217;s help: It takes way too long to describe the issue I&#8217;m having. To make matters worse, conversations are usually synchronous (either chat or phone calls), which means that I&#8217;m wasting a bunch of the limited time they&#8217;re gracious enough to give me on trying to get my thoughts in order and describe the problem well.</p> <p>This is very suboptimal, and I&#8217;d like to propose a better <!-- break --> way here.</p> <h2>Describing the problem</h2> <p>When asking someone for help, I&#8217;ll first ask if they have some time to help me. If they do, I will tell them I will formulate my problem and send it to them in written form. This respects their limited time, as they can get to my issue at their leisure, instead of forcing them to dedicate all their attention to me, as they would have to do with a phone call.</p> <p>In the process of trying to clarify my thoughts, I will very often realize what I have to do to solve it, and solve the issue myself. Even if I don&#8217;t, though, I&#8217;ll have a better grasp of the problem and it&#8217;ll be much easier to explain to someone unfamiliar with it.</p> <p>In general, I tend to follow a simple process, whose goal is to preemptively answer any question the other person may have. I will write a few short paragraphs detailing the following:</p> <ol> <li>What I&#8217;m trying to do.</li> <li>Why I&#8217;m trying to do it.</li> <li>What I&#8217;ve tried so far.</li> <li>Why each attempt has failed.</li> </ol> <p>Here&#8217;s how I approach each of these sections:</p> <h3>What I&#8217;m trying to do</h3> <p>For this section, I try to keep the following in mind:</p> <ul> <li>Describe my intent in as much detail as I can, focusing mostly on the high level but delving lower where it&#8217;s necessary for the other person to understand the situation.</li> <li>Try to maximize the amount of useful information per sentence.</li> <li>Assume the other person knows a lot about the general field, but very little about my specific problem. If in doubt, I&#8217;ll err on the side of more detail, as it&#8217;s much easier for them to skip around rather than have to come back to me with a question.</li> <li>Try to describe more the high-level motivation behind the effort, rather than get too technical. For example, I will say &#8220;I am trying to add a new feature to my image host to recognize people in photos so I can blur their faces&#8221; rather than &#8220;I&#8217;m trying to use a Gaussian filter on OpenCV bounding boxes&#8221;. This allows the other person to steer me into a better direction, if the attempt I&#8217;ve made at a solution isn&#8217;t the best fit to my problem, and avoids <a href="">the XY problem</a>.</li> </ul> <h3>Why I&#8217;m trying to do it</h3> <p>Continuing from the previous section, I will go into a bit more detail about the <em>why</em> of the problem.</p> <p>I will say something like &#8220;I had a lot of people asking me for an easy way to blur strangers&#8217; faces in their photos&#8221;, to give them a better idea of the purpose behind what I&#8217;m doing. This gives the other person important background information on what I&#8217;m trying to do, and helps them form the context around the overall issue.</p> <p>This section is usually fairly short, only one or two sentences, as the need is usually easy to describe.</p> <h3>What I&#8217;ve tried so far</h3> <p>This section is almost as important as the first one. The most important goals to keep in mind here are:</p> <ul> <li>Describe the things that I&#8217;ve tried, in as much detail as I can.</li> <li>Write up exactly what I did and the steps I followed.</li> <li>Show any relevant code.</li> <li>Explain my rationale behind each attempted solution.</li> </ul> <p>This is important for two reasons:</p> <ol> <li>I&#8217;m showing them a few potential solutions, which means we can skip many things we already know don&#8217;t work.</li> <li>I&#8217;m showing that I&#8217;ve tried things and I&#8217;m asking for their help as a last resort, rather than lazily going to them immediately to get them to solve my problem.</li> </ol> <h3>Why each attempt has failed</h3> <p>Important goals here are:</p> <ul> <li>Describe in detail why each of the solutions I tried doesn&#8217;t solve my problem adequately.</li> <li>For every attempt, I will detail what I expected to happen, and what happened instead.</li> <li>Paste any relevant error messages or debug output.</li> </ul> <p>Detail is very much appreciated here, because I know there&#8217;s nothing more frustrating than someone saying &#8220;it just doesn&#8217;t work&#8221; or &#8220;it doesn&#8217;t do anything&#8221;.</p> <h2>Done</h2> <p>When I&#8217;ve written all this up, I will send it to the person as an email or a wall of text somewhere, and let them get back to me at their leisure. If, at this point, they ask me something, it means I have failed to describe my problem adequately, and I should have spent more time clarifying my problem.</p> <p>Usually, by this point they&#8217;ll have an answer for me. It&#8217;ll either be something I missed in my attempts, or some other alternative I can try, or a completely different solution that I hadn&#8217;t even seen.</p> <p>I don&#8217;t want people to feel I&#8217;m running to them without trying things, or that I want to get them to solve my problem for me, and this process helps me a lot in soliciting feedback. It leaves people impressed at my preparation, and communicates to them that I do this work because I respect their time and appreciate that they&#8217;re spending it to help me.</p> <p>As always, if you have any feedback or questions, please <a href="">Tweet</a> or <a href="">toot</a> at me, or email me directly.</p> Tue, 14 Sep 2021 11:05:12 +0000 Keyyyyyyyys! <div class="pull-quote">The keyboard you never wanted</div><p>I have a friend, Josh. Josh is a literal superhero. He&#8217;s a boring, minivan-driving programmer by day, paramedic and firefighter by night. That&#8217;s already a much more plausible superhero premise than Batman (a billionaire who spends his time fighting street-level crime? Really, Bruce? Is that the best use of your time and billions?).</p> <p>Josh showed me some notes he had taken while he was paramedicking his paramedic things. I say &#8220;showed&#8221;, it was more &#8220;asked me if I could make out what the hell the notes said&#8221;.</p> <p>I could not.</p> <p>The conversation then went like this:</p> <ul class="dialog"> <li>Why don't you type on a computer?</li> <li>A computer is generally hard to set up in the field, and you need to keep eye contact with the patient, so handwriting is more convenient.</li> <li>Why not have a special keyboard?</li> <li>I don't think that's very con</li> <li>It can be wireless, and one-handed!</li> <li>Yeah but still, how am</li> <li>It can have five keys, one for each finger, and you can chord combinations to type!</li> <li>That sounds slow and</li> <li>JOSH THIS IS HAPPENING STOP FIGHTING IT</li> </ul><p>After his outpour of encouragement, I was motivated to create a solution, no matter how hard. I had a rough idea in my mind, but it was going to be tough oh who am I kidding, it&#8217;s five buttons connected to a microcontroller, it would take two minutes.</p> <p>It took four hours. Close enough.</p> <!-- break --> <h2>The hardware</h2> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="innards.jpg" data-lightbox="gallery"><img src="innards-small.jpg"></a></div><span class="caption">Behold.</span></div><p>As I said, the hardware is simple enough: Just a microcontroller and five keyboard switches wired to five of its input pins. Since this build needed to be Bluetooth-enabled, my microcontroller of choice here was the ESP32. It&#8217;s a bit of a power hog because it implements all the required protocols (both WiFi and Bluetooth) in software, but for such a small project it would be good enough.</p> <p>The ESP32 is going to be on and running continuously, and not sleeping between keystrokes, because I&#8217;m too lazy to figure out how to make the ESP32 keep the Bluetooth connection open while sleeping. That means the battery (a 1S 250mAh LiPo) is only good for 2-3 hours, but that&#8217;s 2-3 hours more than anyone will ever use this for, so it&#8217;s more than enough.</p> <p>For the case, I designed a very simple square prism with five holes where the switches should be. In retrospect, I should have measured the distance between my fingers rather than the distance between the switches, because now the case feels a bit small, but that lack of smarts is why I am poor and writing a blog post on a Monday morning.</p> <h2>The software</h2> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="action-shot-haha-shot-get-it.jpg" data-lightbox="gallery"><img src="action-shot-haha-shot-get-it-small.jpg"></a></div><span class="caption">How to get shot in the airport.</span></div><p>Getting the ESP32 to emulate a Bluetooth keyboard took less than a minute, as there is an <a href="">excellent library</a> for that. All I had to do was call an initialization function and then send keystrokes whenever I wanted to, and that was it.</p> <p>Then, the only thing that was left was the meat of the project: How can we use five switches to type the entire alphabet, plus some keys like space, backspace, etc? Obviously, if we only press one switch at a time, we can only type up to five letters, but what if we press <em>multiple</em> switches at once, to produce one character? Then we can type up to 2<sup>5</sup> = 31 characters (actually 32, but pressing nothing is a no-op), which is more than however many letters the English language has nowadays, including the accents and umlauts.</p> <p>With that decided, we there are two issues to solve:</p> <ol> <li>How do we know which switches were pressed? The microcontroller is fast enough to read the switches every microsecond, and it&#8217;s impossible for a human to activate all switches at exactly the same time.</li> <li>How do we decide which character each switch combination will produce?</li> </ol> <h3>Pressing multiple keys together</h3> <p>To figure out which switches were pressed for a key, we&#8217;ll need to somehow wait until all the switches that the user intends to press have been pressed, and only then generate the character. This is different from normal keyboards, which generate a character as soon as you press the switch, and can keep generating it (on repeat) for as long as the switch is pressed.</p> <p>We can achieve this by starting to keep track of presses when one switch is pressed, and waiting until that switch is released. We can also wait until the last switch is released, in practice there&#8217;s no difference between the two approaches. While we&#8217;re keeping track of key presses, we can store the most switches were pressed at one time (and which ones they were), and output the character for that combination when they&#8217;re all released. We could also keep track of which switches have been pressed until they are all released, but that would be less &#8220;natural&#8221;, since the switches could be pressed out-of-sequence and still work.</p> <p>As you can see, there are many ways to peel this particular cat, and the particular method chosen doesn&#8217;t matter too much, as they all produce similar (and satisfactory) results. Now that that&#8217;s done, we can move on to the next (and bigger) problem.</p> <h3>The character map</h3> <p>Deciding which key presses would correspond to which characters was the hardest part, but also the most creative. Since this is a brand-new keyboard layout, I would not be beholden to the mistakes of the past. No more jamming typewriters for me, no more bigrams, I would have free rein to perform extensive research and decide what the optimal correspondence would be for my layout.</p> <p>I decided that that was too much work, and that I&#8217;d just fall back to the good old <a href="">etaoin shrdlu</a>. I found the character frequencies in the English language, and made sure that each of the five most frequently used characters (the spacebar, &#8220;e&#8221;, &#8220;t&#8221;, etc) had its own key. After that, the next most frequent characters got a double press (the thumb plus one of the four others). Then came double presses between the other keys, then triple presses, etc.</p> <p>The final layout <a href="">can be seen here</a>, though it&#8217;s not very straightforward because the keys are read in a binary pattern. In essence, each key is a single bit, with the full press being a 5-bit number, so the thumb key corresponds to 1, the index key to 2, all keys together to 31, and so on.</p> <p>As you can see, the entire source code is barely 100 lines long.</p> <h2>Epilogue</h2> <p>Shortly after uploading the code to the keyboard, I paired it with my mobile and started typing. It actually felt quite nice to type in, because my hand was in a perfectly natural position with a pleasingly hefty object in it, so I spent quite a bit of time typing.</p> <p>It&#8217;s pretty slow to type in, as each character needs multiple presses, so the slow speed is largely due to my unfamiliarity with the keyboard. I&#8217;m sure there&#8217;s lots of room for improvement as I get used to it, which I won&#8217;t, because I have no use for it at all, and it was just a fun experiment for me.</p> <p>After popular demand, here&#8217;s a video of me struggling to type with it:</p> <p><iframe width="560" height="315" src="" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe><br/></p> <p>If anyone wants it, you can own an ORIGINAL STAVROS PROJECT if you pay for shipping (I&#8217;ll even sign it with my chicken scrawl) because it&#8217;s gathering dust in a drawer right now. Also, I may be too lazy to even ship it by the time you read this, so don&#8217;t get too excited, but if you can make it easy for me and come by my house I&#8217;ll definitely give it to you. Or you can make your entirely own keyyyyyyyys keyboard, which is way more rewarding in that you don&#8217;t have to deal with me!</p> <p>You can find the code here:</p> <p><a href=""></a></p> <p>Let me know what you think by <a href="">tweeting</a> or <a href="">tooting</a> at me, or emailing me directly!</p> Mon, 12 Apr 2021 12:37:58 +0000 The "do not be alarmed" clock <div class="pull-quote">An alarm clock for the rest of us</div><p>It&#8217;s a brand new year, which means I should really start writing a new post. I&#8217;ve been wanting to for a while, but we&#8217;ve been in lockdown for two months now and Google Analytics is the only indication that I&#8217;m not alone on the planet, and most of that is bots anyway. I&#8217;ve decided to take a page out of the book of my friend, <a href="">James Stanley</a>, who both does cool things <em>and</em> actually writes about them, so I&#8217;m starting to document all my projects again.</p> <p>Given my non-frenetic, slow-paced lifestyle, I&#8217;ve long had a non-burning need. I don&#8217;t use an alarm to wake up, as I start work late, but I still want to know what time it is when I wake up, just to see if it&#8217;s way too early and I can go to sleep again. A few days a week I have tennis and need to get up early, but if it&#8217;s windy or rainy or very cold, the practice gets canceled and I want to know before I&#8217;m awake enough to not be able to go to sleep again.</p> <p>To accommodate this lifestyle, I&#8217;ve traditionally turned to my mobile phone, but that has some disadvantages. Namely, the screen is too bright and wakes me up when I check the time, and I&#8217;m too obsessive to not check all my messages instead of falling asleep when I see the notifications on the screen. I&#8217;ve long thought that a bedside alarm clock would be perfect for me, but I couldn&#8217;t find one that fulfilled all my requirements:</p> <!-- break --> <ul> <li>I needed something that had a screen that would always be always lit, so I could check the time with one eye and half a brain awake, in the complete darkness of the bedroom.</li> <li>A screen that wasn&#8217;t too bright and wouldn&#8217;t disturb sleep, but that would also be legible in direct sunlight, so I could see the time during the day. This meant adaptive brightness.</li> <li>Octagonal shape so I can lay it down on its side (or at 45°) instead of having to crane my neck up to check the time when lying down (I really did think of everything).</li> <li>Weather forecast for the next hour or two, so I can reliably fail to wake up for tennis when not necessary.</li> <li>Annoying beeper for alarms.</li> <li>Less annoying built-in LEDs that would increase in brighness for a few minutes before an alarm went off, so I can wake up less annoyed.</li> </ul> <p>This is the result:</p> <div class="clearfix"></div><div class="aligncenter"><div class="photo-container"><a href="glamorshot2.jpg" data-lightbox="gallery"><img src="glamorshot2-small.jpg"></a></div><span class="caption">Look at this sexy thing.</span></div><p>Sexy.</p> <h2>Designing this thing</h2> <p>Conceptually, making an alarm clock to fulfill these requirements isn&#8217;t hard. You take a small OLED screen, an ESP8266, connect them together, add some LEDs and a buzzer, and you&#8217;re done! Since the hardware was more-or-less straightforward, I started with designing the octagonal case:</p> <div class="clearfix"></div><div class="aligncenter"><div class="photo-container"><a href="clock-back.png" data-lightbox="gallery"><img src="clock-back-small.png"></a></div><span class="caption">The back of the clock.</span></div><p>The OLED screen has four mounting holes, so I designed the mount around them. For the ESP8266, I use a WeMos D1 Mini board, as it&#8217;s quite small and nice to work with. I initially thought I&#8217;d have to connect the WeMos to the four pins of the screen with wires, but I discovered that if you kinda squint, you can get four of the ESP8266&#8217;s pins to match up with the screen&#8217;s pins and in the correct order, which means I could solder the ESP8266 behind the screen module directly, which simplified the wiring by a lot.</p> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="mockup.jpg" data-lightbox="gallery"><img src="mockup-small.jpg"></a></div><span class="caption">Look at that subtle off-white coloring.</span></div><p>While designing the clock, I created a quick mockup of the screen as I wanted it to show. I decided I wanted the hour to be displayed in as big letters as possible, since this screen is small, and the other elements I wanted were wind speed, temperature and weather forecast for the next hour.</p> <p>The other components left to add were a photoresistor (for sensing ambient light), a buzzer and a LED strip. They were all straightforward to add, the resistor is connected to the ESP8266&#8217;s analog pin, and the buzzer and LED strip to some other free pins.</p> <h2>The software</h2> <p>The software was the slightly trickier part in this build, although that was pretty straightforward too. I needed multiple components:</p> <ul> <li>Getting the time somehow (there&#8217;s no real-time clock on the ESP8266 and I didn&#8217;t want to add one).</li> <li>Dimming the screen in a pleasing manner.</li> <li>Getting the weather forecast and current weather.</li> <li>Connecting to WiFi and autoupdating.</li> <li>Annoying me into wakefulness in a satisfactory manner.</li> </ul> <h3>The clock</h3> <p>There&#8217;s an old Chinese saying:</p> <blockquote><p>To know the time, you must first connect to the internet.</p> <p>&#8211; Sun Inc</p> </blockquote> <p>Luckily, connecting to the Internet is very easy with tzapu&#8217;s <a href="">excellent WiFiManager</a> library. The clock creates an access point, you join it with your phone or computer, give it your WiFi network&#8217;s password, and you&#8217;re done.</p> <p>After connecting, an NTP library can fetch the time every so often from the closest server, so the clock is always up to date. It was important to me that this clock is solid enough to trust with my most important wake-up alarms, and it achieves this goal admirably (unless the internet connection is down).</p> <p>That also takes care of autoupdates, since I have a simple, secure autoupdate machine I set up using <a href="">ESPOTA-server, an autoupdate server I wrote for the ESP8266</a>. The clock checks for a new update every day, and automatically downloads and installs it. This way, distributing updates will work reliably, even to millions of devices, for when you realize that collecting all the users&#8217; personal data is slightly monetizable.</p> <h3>The dimming</h3> <p>After a bit (or a lot) of searching, I finally found how to dim the OLED screen, and another important feature was well on the way of being implemented. Because it was rather hard to find, I&#8217;ll post the code here, along with the appropriate SEO keyword for people to be able to find it. Here is the code for dimming an SSD1306-compatible, I2C 128x64 OLED screen:</p> <div class="highlight"><pre><span></span><span class="kt">void</span><span class="w"> </span><span class="nf">brightnessTask</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">brightness</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="c1">// Set the brightness of the SSD1306 OLED screen with this hack.</span> <span class="w"> </span><span class="c1">// From https:/</span> <span class="w"> </span><span class="c1">// `brightness` should be a value between 5 (for some reason) and 255.</span> <span class="w"> </span><span class="n">display</span><span class="p">.</span><span class="n">ssd1306_command</span><span class="p">(</span><span class="n">SSD1306_SETCONTRAST</span><span class="p">);</span><span class="w"> </span><span class="c1">//0x81</span> <span class="w"> </span><span class="n">display</span><span class="p">.</span><span class="n">ssd1306_command</span><span class="p">(</span><span class="n">brightness</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">display</span><span class="p">.</span><span class="n">ssd1306_command</span><span class="p">(</span><span class="n">SSD1306_SETPRECHARGE</span><span class="p">);</span><span class="w"> </span><span class="c1">//0xD9</span> <span class="w"> </span><span class="n">display</span><span class="p">.</span><span class="n">ssd1306_command</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">display</span><span class="p">.</span><span class="n">ssd1306_command</span><span class="p">(</span><span class="n">SSD1306_SETVCOMDETECT</span><span class="p">);</span><span class="w"> </span><span class="c1">//0xDB</span> <span class="w"> </span><span class="n">display</span><span class="p">.</span><span class="n">ssd1306_command</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </pre></div> <p>Once I discovered that, the rest of the dimming was easy: I would just read the ambient light from the photoresistor and set the screen&#8217;s brightness according to the photoresistor&#8217;s value.</p> <p>Another job <del>well</del> done.</p> <h3>The weather</h3> <div class="clearfix"></div><div class="alignright"><div class="photo-container"><a href="glamorshot1.jpg" data-lightbox="gallery"><img src="glamorshot1-small.jpg"></a></div><span class="caption">What a beauty.</span></div><p>Getting the weather was easier than I thought. <a href="">OpenWeather</a> has a simple API, which accepts a pair of coordinates and gives you the current weather and forecast for that location. It only took a few minutes to fetch that data on the clock, and then I spent some time positioning them on the screen and turning the forecast into an appropriate icon so it could be displayed on the small screen.</p> <p>To display the icons, I used <a href="">Adafruit&#8217;s always helpful GFX library</a> and converted the OpenWeather icons to a format more suitable for display on a monochrome screen. This took a bit of trial and error, as I wanted the icons to be easily distinguishable at a glance, even though their size was 12x12 pixels and just one color. In the end, I think they worked out well, as it&#8217;s easy to tell if it&#8217;s raining or not.</p> <h3>The wakefulness</h3> <p>So the initial plan here was to have a buzzer buzz loudly and annoyingly in an attempt to wake me up. However, I am a pretty light sleeper, and I hate loud and annoying sounds (surprisingly), so I wanted this to be a last resort. The solution I came up with here is the following:</p> <p>Say I want to set the alarm for 9:30, just in time to be ten minutes late for tennis. The LEDs would start glowing very faintly at 9:20, slowly increasing in intensity over the next ten minutes, at which point they&#8217;d reach maximum. If I still wasn&#8217;t up by then, the buzzer would begin beeping at the top of its lung, hopefully waking me up reliably. This seemed like a pretty good algorithm, and one that would both wake me up pleasantly and reliably.</p> <p>There was only one problem: The clock doesn&#8217;t have any buttons or controls or any way for me to enter the alarm time. I could create a web server where I could configure alarms, but I figured I&#8217;d rather get a rooster instead, so I did neither and just made an alarm clock that cannot alarm. This symbolizes the futility of human existence and its incessant search for meaning in a cold and unfeeling universe that&#8217;s ultimately profoundly devoid of such, and then you die.</p> <p>Not having to wake up to a buzzer is cool, though.</p> <h2>Epilogue</h2> <p>Overall, I&#8217;m pretty happy with this clock! I can now open an eyelid, look at the time and fall asleep again, which is a massive quality-of-life improvement over the previous &#8220;fuck sleep I&#8217;ve got Instagram likes&#8221; situation. The octagonal shape is a killer feature, I keep the clock angled 45° (as in the photo) and it really is the perfect angle for reading it while lying down.</p> <p>One thing I&#8217;d like to improve in the future is to add a battery, because right now it&#8217;s solely USB powered and will obviously die if there&#8217;s a power outage. Adding a small LiPo battery and a charger circuit will be pretty easy and will ensure that the clock can keep running for hours even without power.</p> <p>The source code is here:</p> <p><a href=""></a></p> <p>It&#8217;s not very complete, so use it at your own risk, and I haven&#8217;t really added deployment details, but it should be very easy to build with one PlatformIO command.</p> <p>As always, you can <a href="">tweet</a> or <a href="">toot</a> at me, or email me directly for any feedback/hate mail.</p> Sun, 24 Jan 2021 22:37:32 +0000