Avoid accumulation of stale data in websockets

We’ve received reports of some specific instances slowly accumulating more and more binary data over time up to OOMs and globally setting ERL_FULLSWEEP_AFTER=0 has proven to be an effective countermeasure. However, this incurs increased cpu perf costs everywhere and is thus not suitable to apply out of the box. Apparently long-lived Phoenix websocket processes are known to often cause exactly this by getting into a state unfavourable for the garbage collector. Therefore it seems likely affected instances are using timeline streaming and do so in just the right way to trigger this. We can tune the garbage collector just for websocket processes and use a more lenient value of 20 to keep the added perf cost in check. Testing on one affected instance appears to confirm this theory Ref.: https://www.erlang.org/doc/man/erlang#ghlink-process_flag-2-idp226 https://blog.guzman.codes/using-phoenix-channels-high-memory-usage-save-money-with-erlfullsweepafter https://git.pleroma.social/pleroma/pleroma/-/merge_requests/4060 Tested-by: bjo
2024-09-16 17:19:31 +00:00 · 2024-04-04 17:19:58 +02:00 · 2024-04-04 17:19:58 +02:00 · 13e2a811ec
commit 13e2a811ec
parent b03edb4ff4
1 changed files with 6 additions and 0 deletions
--- a/lib/pleroma/web/mastodon_api/websocket_handler.ex
+++ b/lib/pleroma/web/mastodon_api/websocket_handler.ex
@ -18,6 +18,8 @@ defmodule Pleroma.Web.MastodonAPI.WebsocketHandler do
  @timeout :timer.seconds(60)
  # Hibernate every X messages
  @hibernate_every 100
+  # Tune garabge collect for long-lived websocket process
+  @fullsweep_after 20

  def init(%{qs: qs} = req, state) do
    with params <- Enum.into(:cow_qs.parse_qs(qs), %{}),
@ -59,6 +61,10 @@ defmodule Pleroma.Web.MastodonAPI.WebsocketHandler do
      "#{__MODULE__} accepted websocket connection for user #{(state.user || %{id: "anonymous"}).id}, topic #{state.topic}"
    )

+    # process is long-lived and can sometimes accumulate stale data in such a way it's
+    # not freed by young garbage cycles, thus make full collection sweeps more frequent
+    :erlang.process_flag(:fullsweep_after, @fullsweep_after)
+
    Streamer.add_socket(state.topic, state.oauth_token)
    {:ok, %{state | timer: timer()}}
  end