Home Artificial intelligence Meta’s Llama 4 is now available on Workers AI

Meta’s Llama 4 is now available on Workers AI

by prince
\n

\n \n

When I first saw that Jest was running sl so many times, my first thought was to ask my colleague if sl is a valid command on his Mac, and of course it is not. After all, which serious engineer would stuff their machine full of silly commands like sl, gti, cowsay, or toilet ? The next thing I tried was to rename sl to something else, and sure enough all my problems disappeared: yarn test started working perfectly.

\n

\n

So what does Jest have to do with Steam Locomotives?

\n \n \n \n

\n

Nothing, that’s what. The whole affair is an unfortunate naming clash between sl the Steam Locomotive and sl the Sapling CLI. Jest wanted sl the source control system, but ended up getting steam-rolled by sl the Steam Locomotive.

\n

\n \"\"\n

Fortunately the devs took it in good humor, and made a (still unreleased) fix. Check out the train memes!

\n

\n \"\"\n

\n

\n \"\"\n

At this point the main story has ended. However, there are still some unresolved nagging questions, like…

\n

\n

How did the crash arrive at the magic number of a relatively even 27 seconds?

\n \n \n \n

\n

I don’t know. Actually I’m not sure if a forked child executing sl still has a terminal anymore, but the travel time of the train does depend on the terminal width. The wider it is, the longer it takes:

\n

🌈  ~  tput cols\n425\n🌈  ~  time sl\nsl  0.19s user 0.06s system 1% cpu 20.629 total\n🌈  ~  tput cols\n58\n🌈  ~  time sl  \nsl  0.03s user 0.01s system 0% cpu 5.695 total

\n

So the first thing I tried was to run yarn test in a ridiculously narrow terminal and see what happens:

\n

Determin\ning test\n suites \nto run..\n.       \n        \n  ● Test\n suite f\nailed to\n run    \n        \nthrown: \n[Error] \n        \nerror Co\nmmand fa\niled wit\nh exit c\node 1.  \ninfo Vis\nit https\n://yarnp\nkg.com/e\nn/docs/c\nli/run f\nor docum\nentation\n about t\nhis comm\nand.    \nyarn tes\nt  1.92s\n user 0.\n67s syst\nem 9% cp\nu 27.088\n total  \n🌈  back\nstage [m\naster] t\nput cols\n        \n8

\n

Alas, the terminal width doesn’t affect jest at all. Jest calls sl via execa so let’s mock that up locally:

\n

🌈  choochoo  cat runSl.mjs \nimport {execa} from 'execa';\nconst { stdout } = await execa('tput', ['cols']);\nconsole.log('terminal colwidth:', stdout);\nawait execa('sl', ['root']);\n🌈  choochoo  time node runSl.mjs\nterminal colwidth: 80\nnode runSl.mjs  0.21s user 0.06s system 4% cpu 6.730 total

\n

So execa uses the default terminal width of 80, which takes the train 6.7 seconds to cross. And 27 seconds divided by 6.7 is awfully close to 4. So is Jest running sl 4 times? Let’s do a poor man’s bpftrace by hooking into sl like so:

\n

#!/bin/bash\n\nuniqid=$RANDOM\necho "$(date --utc +"%Y-%m-%d %H:%M:%S.%N") $uniqid started" >> /home/yew/executed.log\n/usr/games/sl.actual "$@"\necho "$(date --utc +"%Y-%m-%d %H:%M:%S.%N") $uniqid ended" >> /home/yew/executed.log

\n

And if we check executed.log, sl is indeed executed in 4 waves, albeit by 5 workers simultaneously in each wave:

\n

#wave1\n2025-03-20 13:23:57.125482563 21049 started\n2025-03-20 13:23:57.127526987 21666 started\n2025-03-20 13:23:57.131099388 4897 started\n2025-03-20 13:23:57.134237754 102 started\n2025-03-20 13:23:57.137091737 15733 started\n#wave1 ends, wave2 starts\n2025-03-20 13:24:03.704588580 21666 ended\n2025-03-20 13:24:03.704621737 21049 ended\n2025-03-20 13:24:03.707780748 4897 ended\n2025-03-20 13:24:03.712086346 15733 ended\n2025-03-20 13:24:03.711953000 102 ended\n2025-03-20 13:24:03.714831149 18018 started\n2025-03-20 13:24:03.721293279 23293 started\n2025-03-20 13:24:03.724600164 27918 started\n2025-03-20 13:24:03.729763900 15091 started\n2025-03-20 13:24:03.733176122 18473 started\n#wave2 ends, wave3 starts\n2025-03-20 13:24:10.294286746 18018 ended\n2025-03-20 13:24:10.297261754 23293 ended\n2025-03-20 13:24:10.300925031 27918 ended\n2025-03-20 13:24:10.300950334 15091 ended\n2025-03-20 13:24:10.303498710 24873 started\n2025-03-20 13:24:10.303980494 18473 ended\n2025-03-20 13:24:10.308560194 31825 started\n2025-03-20 13:24:10.310595182 18452 started\n2025-03-20 13:24:10.314222848 16121 started\n2025-03-20 13:24:10.317875812 30892 started\n#wave3 ends, wave4 starts\n2025-03-20 13:24:16.883609316 24873 ended\n2025-03-20 13:24:16.886708598 18452 ended\n2025-03-20 13:24:16.886867725 31825 ended\n2025-03-20 13:24:16.890735338 16121 ended\n2025-03-20 13:24:16.893661911 21975 started\n2025-03-20 13:24:16.898525968 30892 ended\n#crash imminent! wave4 ending, wave5 starting...\n2025-03-20 13:24:23.474925807 21975 ended

\n

The logs were emitted for about 26.35 seconds, which is close to 27. It probably crashed just as wave4 was reporting back. And each wave lasts about 6.7 seconds, right on the money with manual measurement. 

\n

\n

So why is Jest running sl in 4 waves? Why did it crash at the start of the 5th wave?

\n \n \n \n

\n

Let’s again modify the poor man’s bpftrace to also log the args and working directory:

\n

echo "$(date --utc +"%Y-%m-%d %H:%M:%S.%N") $uniqid started: $@ at $PWD" >> /home/yew/executed.log

\n

From the results we can see that the 5 workers are busy executing sl root, which corresponds to the getRoot()  function in jest-change-files/sl.ts

\n

2025-03-21 05:50:22.663263304  started: root at /home/yew/cloudflare/repos/backstage/packages/app/src\n2025-03-21 05:50:22.665550470  started: root at /home/yew/cloudflare/repos/backstage/packages/backend/src\n2025-03-21 05:50:22.667988509  started: root at /home/yew/cloudflare/repos/backstage/plugins/access/src\n2025-03-21 05:50:22.671781519  started: root at /home/yew/cloudflare/repos/backstage/plugins/backstage-components/src\n2025-03-21 05:50:22.673690514  started: root at /home/yew/cloudflare/repos/backstage/plugins/backstage-entities/src\n2025-03-21 05:50:29.247573899  started: root at /home/yew/cloudflare/repos/backstage/plugins/catalog-types-common/src\n2025-03-21 05:50:29.251173536  started: root at /home/yew/cloudflare/repos/backstage/plugins/cross-connects/src\n2025-03-21 05:50:29.255263605  started: root at /home/yew/cloudflare/repos/backstage/plugins/cross-connects-backend/src\n2025-03-21 05:50:29.257293780  started: root at /home/yew/cloudflare/repos/backstage/plugins/pingboard-backend/src\n2025-03-21 05:50:29.260285783  started: root at /home/yew/cloudflare/repos/backstage/plugins/resource-insights/src\n2025-03-21 05:50:35.823374079  started: root at /home/yew/cloudflare/repos/backstage/plugins/scaffolder-backend-module-gaia/src\n2025-03-21 05:50:35.825418386  started: root at /home/yew/cloudflare/repos/backstage/plugins/scaffolder-backend-module-r2/src\n2025-03-21 05:50:35.829963172  started: root at /home/yew/cloudflare/repos/backstage/plugins/security-scorecard-dash/src\n2025-03-21 05:50:35.832597778  started: root at /home/yew/cloudflare/repos/backstage/plugins/slo-directory/src\n2025-03-21 05:50:35.834631869  started: root at /home/yew/cloudflare/repos/backstage/plugins/software-excellence-dashboard/src\n2025-03-21 05:50:42.404063080  started: root at /home/yew/cloudflare/repos/backstage/plugins/teamcity/src

\n

The 16 entries here correspond neatly to the 16 rootDirs configured in Jest for Cloudflare’s backstage. We have 5 trains, and we want to visit 16 stations so let’s do some simple math. 16/5.0 = 3.2 which means our trains need to go back and forth 4 times at a minimum to cover them all.

\n

\n

Final mystery: Why did it crash?

\n \n \n \n

\n

Let’s go back to the very start of our journey. The original [Error] thrown was actually from here and after modifying node_modules/jest-changed-files/index.js, I found that the error is shortMessage: 'Command failed with ENAMETOOLONG: sl status...‘  and the reason why became clear when I interrogated Jest about what it thinks the repos are.

While the git repo is what you’d expect, the sl “repo” looks amazingly like a train wreck in motion:

\n

got repos.git as Set(1) { '/home/yew/cloudflare/repos/backstage' }\ngot repos.sl as Set(1) {\n  '\\x1B[?1049h\\x1B[1;24r\\x1B[m\\x1B(B\\x1B[4l\\x1B[?7h\\x1B[?25l\\x1B[H\\x1B[2J\\x1B[15;80H_\\x1B[15;79H_\\x1B[16d|\\x1B[9;80H_\\x1B[12;80H|\\x1B[13;80H|\\x1B[14;80H|\\x1B[15;78H__/\\x1B[16;79H|/\\x1B[17;80H\\\\\\x1B[9;\n  79H_D\\x1B[10;80H|\\x1B[11;80H/\\x1B[12;79H|\\x1B[K\\x1B[13d\\b|\\x1B[K\\x1B[14d\\b|/\\x1B[15;1H\\x1B[1P\\x1B[16;78H|/-\\x1B[17;79H\\\\_\\x1B[9;1H\\x1B[1P\\x1B[10;79H|(\\x1B[11;79H/\\x1B[K\\x1B[12d\\b\\b|\\x1B[K\\x1B[13d\\b|\n  _\\x1B[14;1H\\x1B[1P\\x1B[15;76H__/ =\\x1B[16;77H|/-=\\x1B[17;78H\\\\_/\\x1B[9;77H_D _\\x1B[10;78H|(_\\x1B[11;78H/\\x1B[K\\x1B[12d\\b\\b|\\x1B[K\\x1B[13d\\b| _\\x1B[14;77H"https://blog.cloudflare.com/"\\x1B[15;75H__/\n  =|\\x1B[16;76H|/-=|\\x1B[17;1H\\x1B[1P\\x1B[8;80H=\\x1B[9;76H_D _|\\x1B[10;77H|(_)\\x1B[11;77H/\\x1B[K\\x1B[12d\\b\\b|\\x1B[K\\x1B[13d\\b|\n  _\\r\\x1B[14d\\x1B[1P\\x1B[15d\\x1B[1P\\x1B[16;75H|/-=|_\\x1B[17;1H\\x1B[1P\\x1B[8;79H=\\r\\x1B[9d\\x1B[1P\\x1B[10;76H|(_)-\\x1B[11;76H/\\x1B[K\\x1B[12d\\b\\b|\\x1B[K\\x1B[13d\\b| _\\r\\x1B[14d\\x1B[1P\\x1B[15;73H__/ =|\n  o\\x1B[16;74H|/-=|_\\r\\x1B[17d\\x1B[1P\\x1B[8;78H=\\r\\x1B[9d\\x1B[1P\\x1B[10;75H|(_)-\\x1B[11;75H/\\x1B[K\\x1B[12d\\b\\b|\\x1B[K\\x1B[13d\\b|\n  _\\r\\x1B[14d\\x1B[1P\\x1B[15d\\x1B[1P\\x1B[16;73H|/-=|_\\r\\x1B[17d\\x1B[1P\\x1B[8;77H=\\x1B[9;73H_D _|  |\\x1B[10;74H|(_)-\\x1B[11;74H/     |\\x1B[12;73H|      |\\x1B[13;73H| _\\x1B[14;73H"https://blog.cloudflare.com/"   |\\x1B[15;71H__/\n  =| o |\\x1B[16;72H|/-=|___|\\x1B[17;1H\\x1B[1P\\x 1B[5;79H(@\\x1B[7;77H(\\r\\x1B[8d\\x1B[1P\\x1B[9;72H_D _|  |_\\x1B[10;1H\\x1B[1P\\x1B[11d\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;72H| _\\x1B[14;72H"https://blog.cloudflare.com/"   |-\\x1B[15;70H__/\n  =| o |=\\x1B[16;71H|/-=|___|=\\x1B[17;1H\\x1B[1P\\x1B[8d\\x1B[1P\\x1B[9;71H_D _|  |_\\r\\x1B[10d\\x1B[1P\\x1B[11d\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;71H| _\\x1B[14;71H"https://blog.cloudflare.com/"   |-\\x1B[15;69H__/ =| o\n  |=-\\x1B[16;70H|/-=|___|=O\\x1B[17;71H\\\\_/      \\\\\\x1B[8;1H\\x1B[1P\\x1B[9;70H_D _|  |_\\x1B[10;71H|(_)---  |\\x1B[11;71H/     |  |\\x1B[12;70H|      |  |\\x1B[13;70H| _\\x1B[80G|\\x1B[14;70H"https://blog.cloudflare.com/"\n  |-\\x1B[15;68H__/ =| o |=-~\\x1B[16;69H|/-=|___|=\\x1B[K\\x1B[17;70H\\\\_/      \\\\O\\x1B[8;1H\\x1B[1P\\x1B[9;69H_D _|  |_\\r\\x1B[10d\\x1B[1P\\x1B[11d\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;69H| _\\x1B[79G|_\\x1B[14;69H"https://blog.cloudflare.com/"\n  |-\\x1B[15;67H__/ =| o |=-~\\r\\x1B[16d\\x1B[1P\\x1B[17;69H\\\\_/      \\\\_\\x1B[4d\\b\\b(@@\\x1B[5;75H(    )\\x1B[7;73H(@@@)\\r\\x1B[8d\\x1B[1P\\x1B[9;68H_D _|\n  |_\\r\\x1B[10d\\x1B[1P\\x1B[11d\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;68H| _\\x1B[78G|_\\x1B[14;68H"https://blog.cloudflare.com/"   |-\\x1B[15;66H__/ =| o |=-~~\\\\\\x1B[16;67H|/-=|___|=   O\\x1B[17;68H\\\\_/ \\\\__/\\x1B[8;1H\\x1B[1P\\x1B[9;67H_D _|\n  |_\\r\\x1B[10d\\x1B[1P\\x1B[11d\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;67H| _\\x1B[77G|_\\x1B[14;67H"https://blog.cloudflare.com/"   |-\\x1B[15;65H__/ =| o |=-~O==\\x1B[16;66H|/-=|___|= |\\x1B[17;1H\\x1B[1P\\x1B[8d\\x1B[1P\\x1B[9;66H_D _|\n  |_\\x1B[10;67H|(_)---  |   H\\x1B[11;67H/     |  |   H\\x1B[12;66H|      |  |   H\\x1B[13;66H| _\\x1B[76G|___H\\x1B[14;66H"https://blog.cloudflare.com/"   |-\\x1B[15;64H__/ =| o |=-O==\\x1B[16;65H|/-=|___|=\n  |\\r\\x1B[17d\\x1B[1P\\x1B[8d\\x1B[1P\\x1B[9;65H_D _|  |_\\x1B[80G/\\x1B[10;66H|(_)---  |   H\\\\\\x1B[11;1H\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;65H| _\\x1B[75G|___H_\\x1B[14;65H"https://blog.cloudflare.com/" |-\\x1B[15;63H__/ =| o |=-~~\\\\\n  /\\x1B[16;64H|/-=|___|=O=====O\\x1B[17;65H\\\\_/      \\\\__/  \\\\\\x1B[1;4r\\x1B[4;1H\\n' + '\\x1B[1;24r\\x1B[4;74H(    )\\x1B[5;71H(@@@@)\\x1B[K\\x1B[7;69H(   )\\x1B[K\\x1B[8;68H====\n  \\x1B[80G_\\x1B[9;1H\\x1B[1P\\x1B[10;65H|(_)---  |   H\\\\_\\x1B[11;1H\\x1B[1P\\x1B[12d\\x1B[1P\\x1B[13;64H| _\\x1B[74G|___H_\\x1B[14;64H"https://blog.cloudflare.com/"   |-\\x1B[15;62H__/ =| o |=-~~\\\\  /~\\x1B[16;63H|/-=|___|=\n  ||\\x1B[K\\x1B[17;64H\\\\_/      \\\\O=====O\\x1B[8;67H==== \\x1B[79G_\\r\\x1B[9d\\x1B[1P\\x1B[10;64H|(_)---  |   H\\\\_\\x1B[11;64H/     |  |   H  |\\x1B[12;63H|      |  |   H  |\\x1B[13;63H|\n  _\\x1B[73G|___H__/\\x1B[14;63H"https://blog.cloudflare.com/"   |-\\x1B[15;61H__/ =| o |=-~~\\\\  /~\\r\\x1B[16d\\x1B[1P\\x1B[17;63H\\\\_/      \\\\_\\x1B[8;66H==== \\x1B[78G_\\r\\x1B[9d\\x1B[1P\\x1B[10;63H|(_)---  |\n  H\\\\_\\r\\x1B[11d\\x1B[1P\\x1B[12;62H|      |  |   H  |_\\x1B[13;62H| _\\x1B[72G|___H__/_\\x1B[14;62H"https://blog.cloudflare.com/"   |-\\x1B[15;60H__/ =| o |=-~~\\\\  /~~\\\\\\x1B[16;61H|/-=|___|=   O=====O\\x1B[17;62H\\\\_/      \\\\__/\n  \\\\__/\\x1B[8;65H==== \\x1B[77G_\\r\\x1B[9d\\x1B[1P\\x1B[10;62H|(_)---  |   H\\\\_\\r\\x1B[11d\\x1B[1P\\x1B[12;61H|      |  |   H  |_\\x1B[13;61H| _\\x1B[71G|___H__/_\\x1B[14;61H"https://blog.cloudflare.com/"   |-\\x1B[80GI\\x1B[15;59H__/ =|\n  o |=-~O=====O==\\x1B[16;60H|/-=|___|=    ||    |\\x1B[17;1H\\x1B[1P\\x1B[2;79H(@\\x1B[3;74H(   )\\x1B[K\\x1B[4;70H(@@@@)\\x1B[K\\x1B[5;67H(    )\\x1B[K\\x1B[7;65H(@@@)\\x1B[K\\x1B[8;64H====\n  \\x1B[76G_\\r\\x1B[9d\\x1B[1P\\x1B[10;61H|(_)---  |   H\\\\_\\x1B[11;61H/     |  |   H  |  |\\x1B[12;60H|      |  |   H  |__-\\x1B[13;60H| _\\x1B[70G|___H__/__|\\x1B[14;60H"https://blog.cloudflare.com/"   |-\\x1B[79GI_\\x1B[15;58H__/ =| o\n  |=-O=====O==\\x1B[16;59H|/-=|___|=    ||    |\\r\\x1B[17d\\x1B[1P\\x1B[8;63H==== \\x1B[75G_\\r\\x1B[9d\\x1B[1P\\x1B[10;60H|(_)---  |   H\\\\_\\r\\x1B[11d\\x1B[1P\\x1B[12;59H|      |  |   H  |__-\\x1B[13;59H|\n  _\\x1B[69G|___H__/__|_\\x1B[14;59H"https://blog.cloudflare.com/"   |-\\x1B[78GI_\\x1B[15;57H__/ =| o |=-~~\\\\  /~~\\\\  /\\x1B[16;58H|/-=|___|=O=====O=====O\\x1B[17;59H\\\\_/      \\\\__/  \\\\__/  \\\\\\x1B[8;62H====\n  \\x1B[74G_\\r\\x1B[9d\\x1B[1P\\x1B[10;59H|(_)---  |   H\\\\_\\r\\x1B  |  |   H  |__-\\x1B[13;58H| _\\x1B[68G|___H__/__|_\\x1B[14;58H"https://blog.cloudflare.com/"   |-\\x1B[77GI_\\x1B[15;56H__/ =| o |=-~~\\\\ /~~\\\\  /~\\x1B[16;57H|/-=|___|=\n  ||    ||\\x1B[K\\x1B[17;58H\\\\_/      \\\\O=====O=====O\\x1B[8;61H==== \\x1B[73G_\\r\\x1B[9d\\x1B[1P\\x1B[10;58H|(_)---    _\\x1B[67G|___H__/__|_\\x1B[14;57H"https://blog.cloudflare.com/"   |-\\x1B[76GI_\\x1B[15;55H__/ =| o |=-~~\\\\  /~~\\\\\n  /~\\r\\x1B[16d\\x1B[1P\\x1B[17;57H\\\\_/      \\\\_\\x1B[2;75H(  ) (\\x1B[3;70H(@@@)\\x1B[K\\x1B[4;66H()\\x1B[K\\x1B[5;63H(@@@@)\\x1B[

\n \n

\n

Acknowledgements

\n \n \n \n

\n

Thank you to my colleagues Mengnan Gong and Shuhao Zhang, whose ideas and perspectives helped narrow down the root causes of this mystery.

If you enjoy troubleshooting weird and tricky production issues, our engineering teams are hiring.

“],”published_at”:[0,”2025-04-02T14:00+01:00″],”updated_at”:[0,”2025-04-02T13:00:03.425Z”],”feature_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1yA1TgNlIUZwtZ4bIL39EJ/0bfc765dae00213eca7e48a337c8178e/image1.png”],”tags”:[1,[[0,{“id”:[0,”2UVIYusJwlvsmPYl2AvSuR”],”name”:[0,”Deep Dive”],”slug”:[0,”deep-dive”]}],[0,{“id”:[0,”383iv0UQ6Lp0GZwOAxGq2p”],”name”:[0,”Linux”],”slug”:[0,”linux”]}],[0,{“id”:[0,”3JAY3z7p7An94s6ScuSQPf”],”name”:[0,”Developer Platform”],”slug”:[0,”developer-platform”]}],[0,{“id”:[0,”4HIPcb68qM0e26fIxyfzwQ”],”name”:[0,”Developers”],”slug”:[0,”developers”]}]]],”relatedTags”:[0],”authors”:[1,[[0,{“name”:[0,”Yew Leong”],”slug”:[0,”yew-leong”],”bio”:[0],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/658l52gIu4kyDjwnJCelUt/d0a7c86def68692d50d9b4a0d6fc2f18/_tmp_mini_magick20221116-43-2dcplr.jpg”],”location”:[0],”website”:[0],”twitter”:[0],”facebook”:[0]}]]],”meta_description”:[0,”Yarn tests fail consistently at the 27-second mark. The usual suspects are swiftly eliminated to no avail. A deep dive is taken to comb through traces, only to be derailed into an unexpected crash investigation.”],”primary_author”:[0,{}],”localeList”:[0,{“name”:[0,”blog-english-only”],”enUS”:[0,”English for Locale”],”zhCN”:[0,”No Page for Locale”],”zhHansCN”:[0,”No Page for Locale”],”zhTW”:[0,”No Page for Locale”],”frFR”:[0,”No Page for Locale”],”deDE”:[0,”No Page for Locale”],”itIT”:[0,”No Page for Locale”],”jaJP”:[0,”No Page for Locale”],”koKR”:[0,”No Page for Locale”],”ptBR”:[0,”No Page for Locale”],”esLA”:[0,”No Page for Locale”],”esES”:[0,”No Page for Locale”],”enAU”:[0,”No Page for Locale”],”enCA”:[0,”No Page for Locale”],”enIN”:[0,”No Page for Locale”],”enGB”:[0,”No Page for Locale”],”idID”:[0,”No Page for Locale”],”ruRU”:[0,”No Page for Locale”],”svSE”:[0,”No Page for Locale”],”viVN”:[0,”No Page for Locale”],”plPL”:[0,”No Page for Locale”],”arAR”:[0,”No Page for Locale”],”nlNL”:[0,”No Page for Locale”],”thTH”:[0,”No Page for Locale”],”trTR”:[0,”No Page for Locale”],”heIL”:[0,”No Page for Locale”],”lvLV”:[0,”No Page for Locale”],”etEE”:[0,”No Page for Locale”],”ltLT”:[0,”No Page for Locale”]}],”url”:[0,”https://blog.cloudflare.com/yarn-test-suffers-strange-derailment”],”metadata”:[0,{“title”:[0,”A steam locomotive from 1993 broke my yarn test”],”description”:[0,”Yarn tests fail consistently at the 27-second mark. The usual suspects are swiftly eliminated to no avail. A deep dive is taken to comb through traces, only to be derailed into an unexpected crash investigation.”],”imgPreview”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4y36mkMs8GlYflk6MMmLnh/42840cf34748ac9b6619c5d47704db10/A_steam_locomotive_from_1993_broke_my_yarn_test-OG.png”]}]}],[0,{“id”:[0,”4e3J8mxEIN24iNKfw9ToEH”],”title”:[0,”Build and deploy Remote Model Context Protocol (MCP) servers to Cloudflare”],”slug”:[0,”remote-model-context-protocol-servers-mcp”],”excerpt”:[0,”You can now build and deploy remote MCP servers to Cloudflare, and we handle the hard parts of building remote MCP servers for you.”],”featured”:[0,false],”html”:[0,”

It feels like almost everyone building AI applications and agents is talking about the Model Context Protocol (MCP), as well as building MCP servers that you install and run locally on your own computer.

You can now build and deploy remote MCP servers to Cloudflare. We’ve added four things to Cloudflare that handle the hard parts of building remote MCP servers for you:

  1. workers-oauth-provider — an OAuth Provider that makes authorization easy

  2. McpAgent — a class built into the Cloudflare Agents SDK that handles remote transport

  3. mcp-remote — an adapter that lets MCP clients that otherwise only support local connections work with remote MCP servers

  4. AI playground as a remote MCP client — a chat interface that allows you to connect to remote MCP servers, with the authentication check included

The button below, or the developer docs, will get you up and running in production with this example MCP server in less than two minutes:

\"Deploy\n

Unlike the local MCP servers you may have previously used, remote MCP servers are accessible on the Internet. People simply sign in and grant permissions to MCP clients using familiar authorization flows. We think this is going to be a massive deal — connecting coding agents to MCP servers has blown developers’ minds over the past few months, and remote MCP servers have the same potential to open up similar new ways of working with LLMs and agents to a much wider audience, including more everyday consumer use cases.

\n

\n

From local to remote — bringing MCP to the masses

\n \n \n \n

\n

MCP is quickly becoming the common protocol that enables LLMs to go beyond inference and RAG, and take actions that require access beyond the AI application itself (like sending an email, deploying a code change, publishing blog posts, you name it). It enables AI agents (MCP clients) to access tools and resources from external services (MCP servers).

To date, MCP has been limited to running locally on your own machine — if you want to access a tool on the web using MCP, it’s up to you to set up the server locally. You haven’t been able to use MCP from web-based interfaces or mobile apps, and there hasn’t been a way to let people authenticate and grant the MCP client permission. Effectively, MCP servers haven’t yet been brought online.

\n

\n \"\"\n

Support for remote MCP connections changes this. It creates the opportunity to reach a wider audience of Internet users who aren’t going to install and run MCP servers locally for use with desktop apps. Remote MCP support is like the transition from desktop software to web-based software. People expect to continue tasks across devices and to login and have things just work. Local MCP is great for developers, but remote MCP connections are the missing piece to reach everyone on the Internet.

\n

\n \"\"\n

\n

\n

Making authentication and authorization just work with MCP

\n \n \n \n

\n

Beyond just changing the transport layer — from stdio to streamable HTTP — when you build a remote MCP server that uses information from the end user’s account, you need authentication and authorization. You need a way to allow users to login and prove who they are (authentication) and a way for users to control what the AI agent will be able to access when using a service (authorization).

MCP does this with OAuth, which has been the standard protocol that allows users to grant applications to access their information or services, without sharing passwords. Here, the MCP Server itself acts as the OAuth Provider. However, OAuth with MCP is hard to implement yourself, so when you build MCP servers on Cloudflare we provide it for you.

\n

\n

workers-oauth-provider — an OAuth 2.1 Provider library for Cloudflare Workers

\n \n \n \n

\n

When you deploy an MCP Server to Cloudflare, your Worker acts as an OAuth Provider, using workers-oauth-provider, a new TypeScript library that wraps your Worker’s code, adding authorization to API endpoints, including (but not limited to) MCP server API endpoints.

Your MCP server will receive the already-authenticated user details as a parameter. You don’t need to perform any checks of your own, or directly manage tokens. You can still fully control how you authenticate users: from what UI they see when they log in, to which provider they use to log in. You can choose to bring your own third-party authentication and authorization providers like Google or GitHub, or integrate with your own.

The complete MCP OAuth flow looks like this:

\n

\n \"\"\n

Here, your MCP server acts as both an OAuth client to your upstream service, and as an OAuth server (also referred to as an OAuth “provider”) to MCP clients. You can use any upstream authentication flow you want, but workers-oauth-provider guarantees that your MCP server is spec-compliant and able to work with the full range of client apps & websites. This includes support for Dynamic Client Registration (RFC 7591) and Authorization Server Metadata (RFC 8414).

\n

\n

A simple, pluggable interface for OAuth

\n \n \n \n

\n

When you build an MCP server with Cloudflare Workers, you provide an instance of the OAuth Provider paths to your authorization, token, and client registration endpoints, along with handlers for your MCP Server, and for auth:

\n

import OAuthProvider from "@cloudflare/workers-oauth-provider";\nimport MyMCPServer from "./my-mcp-server";\nimport MyAuthHandler from "./auth-handler";\n\nexport default new OAuthProvider({\n  apiRoute: "/sse", // MCP clients connect to your server at this route\n  apiHandler: MyMCPServer.mount('/sse'), // Your MCP Server implmentation\n  defaultHandler: MyAuthHandler, // Your authentication implementation\n  authorizeEndpoint: "/authorize",\n  tokenEndpoint: "/token",\n  clientRegistrationEndpoint: "/register",\n});

\n

This abstraction lets you easily plug in your own authentication. Take a look at this example that uses GitHub as the identity provider for an MCP server, in less than 100 lines of code, by implementing /callback and /authorize routes.

\n

\n

Why do MCP servers issue their own tokens?

\n \n \n \n

\n

You may have noticed in the authorization diagram above, and in the authorization section of the MCP spec, that the MCP server issues its own token to the MCP client.

Instead of passing the token it receives from the upstream provider directly to the MCP client, your Worker stores an encrypted access token in Workers KV. It then issues its own token to the client. As shown in the GitHub example above, this is handled on your behalf by the workers-oauth-provider — your code never directly handles writing this token, preventing mistakes. You can see this in the following code snippet from the GitHub example above:

\n

  // When you call completeAuthorization, the accessToken you pass to it\n  // is encrypted and stored, and never exposed to the MCP client\n  // A new, separate token is generated and provided to the client at the /token endpoint\n  const { redirectTo } = await c.env.OAUTH_PROVIDER.completeAuthorization({\n    request: oauthReqInfo,\n    userId: login,\n    metadata: { label: name },\n    scope: oauthReqInfo.scope,\n    props: {\n      accessToken,  // Stored encrypted, never sent to MCP client\n    },\n  })\n\n  return Response.redirect(redirectTo)

\n

On the surface, this indirection might sound more complicated. Why does it work this way?

By issuing its own token, MCP Servers can restrict access and enforce more granular controls than the upstream provider. If a token you issue to an MCP client is compromised, the attacker only gets the limited permissions you’ve explicitly granted through your MCP tools, not the full access of the original token.

Let’s say your MCP server requests that the user authorize permission to read emails from their Gmail account, using the gmail.readonly scope. The tool that the MCP server exposes is more narrow, and allows reading travel booking notifications from a limited set of senders, to handle a question like “What’s the check-out time for my hotel room tomorrow?” You can enforce this constraint in your MCP server, and if the token you issue to the MCP client is compromised, because the token is to your MCP server — and not the raw token to the upstream provider (Google) — an attacker cannot use it to read arbitrary emails. They can only call the tools your MCP server provides. OWASP calls out “Excessive Agency” as one of the top risk factors for building AI applications, and by issuing its own token to the client and enforcing constraints, your MCP server can limit tools access to only what the client needs.

Or building off the earlier GitHub example, you can enforce that only a specific user is allowed to access a particular tool. In the example below, only users that are part of an allowlist can see or call the generateImage tool, that uses Workers AI to generate an image based on a prompt:

\n

import { McpAgent } from "agents/mcp";\nimport { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";\nimport { z } from "zod";\n\nconst USER_ALLOWLIST = ["geelen"];\n\nexport class MyMCP extends McpAgent<Props, Env> {\n  server = new McpServer({\n    name: "Github OAuth Proxy Demo",\n    version: "1.0.0",\n  });\n\n  async init() {\n    // Dynamically add tools based on the user's identity\n    if (USER_ALLOWLIST.has(this.props.login)) {\n      this.server.tool(\n        'generateImage',\n        'Generate an image using the flux-1-schnell model.',\n        {\n          prompt: z.string().describe('A text description of the image you want to generate.')\n        },\n        async ({ prompt }) => {\n          const response = await this.env.AI.run('@cf/black-forest-labs/flux-1-schnell', { \n            prompt, \n            steps: 8 \n          })\n          return {\n            content: [{ type: 'image', data: response.image!, mimeType: 'image/jpeg' }],\n          }\n        }\n      )\n    }\n  }\n}\n

\n \n

\n

Introducing McpAgent: remote transport support that works today, and will work with the revision to the MCP spec

\n \n \n \n

\n

The next step to opening up MCP beyond your local machine is to open up a remote transport layer for communication. MCP servers you run on your local machine just communicate over stdio, but for an MCP server to be callable over the Internet, it must implement remote transport.

The McpAgent class we introduced today as part of our Agents SDK handles this for you, using Durable Objects behind the scenes to hold a persistent connection open, so that the MCP client can send server-sent events (SSE) to your MCP server. You don’t have to write code to deal with transport or serialization yourself. A minimal MCP server in 15 lines of code can look like this:

\n

import { McpAgent } from "agents/mcp";\nimport { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";\nimport { z } from "zod";\n\nexport class MyMCP extends McpAgent {\n  server = new McpServer({\n    name: "Demo",\n    version: "1.0.0",\n  });\n  async init() {\n    this.server.tool("add", { a: z.number(), b: z.number() }, async ({ a, b }) => ({\n      content: [{ type: "text", text: String(a + b) }],\n    }));\n  }\n}

\n

After much discussion, remote transport in the MCP spec is changing, with Streamable HTTP replacing HTTP+SSE This allows for stateless, pure HTTP connections to MCP servers, with an option to upgrade to SSE, and removes the need for the MCP client to send messages to a separate endpoint than the one it first connects to. The McpAgent class will change with it and just work with streamable HTTP, so that you don’t have to start over to support the revision to how transport works.

This applies to future iterations of transport as well. Today, the vast majority of MCP servers only expose tools, which are simple remote procedure call (RPC) methods that can be provided by a stateless transport. But more complex human-in-the-loop and agent-to-agent interactions will need prompts and sampling. We expect these types of chatty, two-way interactions will need to be real-time, which will be challenging to do well without a bidirectional transport layer. When that time comes, Cloudflare, the Agents SDK, and Durable Objects all natively support WebSockets, which enable full-duplex, bidirectional real-time communication. 

\n

\n

Stateful, agentic MCP servers

\n \n \n \n

\n

When you build MCP servers on Cloudflare, each MCP client session is backed by a Durable Object, via the Agents SDK. This means each session can manage and persist its own state, backed by its own SQL database.

This opens the door to building stateful MCP servers. Rather than just acting as a stateless layer between a client app and an external API, MCP servers on Cloudflare can themselves be stateful applications — games, a shopping cart plus checkout flow, a persistent knowledge graph, or anything else you can dream up. When you build on Cloudflare, MCP servers can be much more than a layer in front of your REST API.

To understand the basics of how this works, let’s look at a minimal example that increments a counter:

\n

import { McpAgent } from "agents/mcp";\nimport { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";\nimport { z } from "zod";\n\ntype State = { counter: number }\n\nexport class MyMCP extends McpAgent<Env, State, {}> {\n  server = new McpServer({\n    name: "Demo",\n    version: "1.0.0",\n  });\n\n  initialState: State = {\n    counter: 1,\n  }\n\n  async init() {\n    this.server.resource(`counter`, `mcp://resource/counter`, (uri) => {\n      return {\n        contents: [{ uri: uri.href, text: String(this.state.counter) }],\n      }\n    })\n\n    this.server.tool('add', 'Add to the counter, stored in the MCP', { a: z.number() }, async ({ a }) => {\n      this.setState({ ...this.state, counter: this.state.counter + a })\n\n      return {\n        content: [{ type: 'text', text: String(`Added ${a}, total is now ${this.state.counter}`) }],\n      }\n    })\n  }\n\n  onStateUpdate(state: State) {\n    console.log({ stateUpdate: state })\n  }\n\n}

\n

For a given session, the MCP server above will remember the state of the counter across tool calls.

From within an MCP server, you can use Cloudflare’s whole developer platform, and have your MCP server spin up its own web browser, trigger a Workflow, call AI models, and more. We’re excited to see the MCP ecosystem evolve into more advanced use cases.

\n

\n

Connect to remote MCP servers from MCP clients that today only support local MCP

\n \n \n \n

\n

Cloudflare is supporting remote MCP early — before the most prominent MCP client applications support remote, authenticated MCP, and before other platforms support remote MCP. We’re doing this to give you a head start building for where MCP is headed.

But if you build a remote MCP server today, this presents a challenge — how can people start using your MCP server if there aren’t MCP clients that support remote MCP?

We have two new tools that allow you to test your remote MCP server and simulate how users will interact with it in the future:

We updated the Workers AI Playground to be a fully remote MCP client that allows you to connect to any remote MCP server with built-in authentication support. This online chat interface lets you immediately test your remote MCP servers without having to install anything on your device. Instead, just enter the remote MCP server’s URL (e.g. https://remote-server.example.com/sse) and click Connect.

\n

\n \"\"\n

Once you click Connect, you’ll go through the authentication flow (if you set one up) and after, you will be able to interact with the MCP server tools directly from the chat interface.

If you prefer to use a client like Claude Desktop or Cursor that already supports MCP but doesn’t yet handle remote connections with authentication, you can use mcp-remote. mcp-remote is an adapter that  lets MCP clients that otherwise only support local connections to work with remote MCP servers. This gives you and your users the ability to preview what interactions with your remote MCP server will be like from the tools you’re already using today, without having to wait for the client to support remote MCP natively. 

We’ve published a guide on how to use mcp-remote with popular MCP clients including Claude Desktop, Cursor, and Windsurf. In Claude Desktop, you add the following to your configuration file:

\n

{\n  "mcpServers": {\n    "remote-example": {\n      "command": "npx",\n      "args": [\n        "mcp-remote",\n        "https://remote-server.example.com/sse"\n      ]\n    }\n  }\n}

\n \n \n

Remote Model Context Protocol (MCP) is coming! When client apps support remote MCP servers, the audience of people who can use them opens up from just us, developers, to the rest of the population — who may never even know what MCP is or stands for. 

Building a remote MCP server is the way to bring your service into the AI assistants and tools that millions of people use. We’re excited to see many of the biggest companies on the Internet are busy building MCP servers right now, and we are curious about the businesses that pop-up in an agent-first, MCP-native way.

On Cloudflare, you can start building today. We’re ready for you, and ready to help build with you. Email us at [email protected], and we’ll help get you going. There’s lots more to come with MCP, and we’re excited to see what you build.

“],”published_at”:[0,”2025-03-25T13:59+00:00″],”updated_at”:[0,”2025-03-25T15:11:42.693Z”],”feature_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ifiJyB00Saj3K0TtU5QWn/7c552a4795603414457c7c33c4f432a2/image2.png”],”tags”:[1,[[0,{“id”:[0,”6Foe3R8of95cWVnQwe5Toi”],”name”:[0,”AI”],”slug”:[0,”ai”]}],[0,{“id”:[0,”4HIPcb68qM0e26fIxyfzwQ”],”name”:[0,”Developers”],”slug”:[0,”developers”]}],[0,{“id”:[0,”6Lfy7VaNvl5G8gOYMKFiux”],”name”:[0,”MCP”],”slug”:[0,”mcp”]}],[0,{“id”:[0,”22RkiaggH3NV4u6qyMmC42″],”name”:[0,”Agents”],”slug”:[0,”agents”]}]]],”relatedTags”:[0],”authors”:[1,[[0,{“name”:[0,”Brendan Irvine-Broque”],”slug”:[0,”brendan-irvine-broque”],”bio”:[0,”Product Manager, Cloudflare Stream”],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/lTJBFKfbqthKbJKPvulre/e8bf53afa7caf1dffeeb55a8c6884959/brendan-irvine-broque.JPG”],”location”:[0,”Oakland, CA”],”website”:[0,”https://www.cloudflare.com/products/cloudflare-stream/”],”twitter”:[0,”@irvinebroque”],”facebook”:[0,null]}],[0,{“name”:[0,”Dina Kozlov”],”slug”:[0,”dina”],”bio”:[0,null],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/bY78cK0burCjZbD6jOgAH/a8479b5ea6dd8fb3acb41227c1a4ad0e/dina.jpg”],”location”:[0,null],”website”:[0,null],”twitter”:[0,”@dinasaur_404″],”facebook”:[0,null]}],[0,{“name”:[0,”Glen Maddern”],”slug”:[0,”glen”],”bio”:[0,null],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7dtWmquDOA3nc27l0f7RwQ/43791027b587018e9003bf83e28b77df/glen.jpg”],”location”:[0,null],”website”:[0,null],”twitter”:[0,”@glenmaddern”],”facebook”:[0,null]}]]],”meta_description”:[0,”You can now build and deploy remote MCP servers to Cloudflare, and we handle the hard parts of building remote MCP servers for you. Unlike local MCP servers you may have previously used, remote MCP servers are Internet-accessible. People simply sign in and grant permissions to MCP clients using familiar authorization flows.”],”primary_author”:[0,{}],”localeList”:[0,{“name”:[0,”blog-english-only”],”enUS”:[0,”English for Locale”],”zhCN”:[0,”No Page for Locale”],”zhHansCN”:[0,”No Page for Locale”],”zhTW”:[0,”No Page for Locale”],”frFR”:[0,”No Page for Locale”],”deDE”:[0,”No Page for Locale”],”itIT”:[0,”No Page for Locale”],”jaJP”:[0,”No Page for Locale”],”koKR”:[0,”No Page for Locale”],”ptBR”:[0,”No Page for Locale”],”esLA”:[0,”No Page for Locale”],”esES”:[0,”No Page for Locale”],”enAU”:[0,”No Page for Locale”],”enCA”:[0,”No Page for Locale”],”enIN”:[0,”No Page for Locale”],”enGB”:[0,”No Page for Locale”],”idID”:[0,”No Page for Locale”],”ruRU”:[0,”No Page for Locale”],”svSE”:[0,”No Page for Locale”],”viVN”:[0,”No Page for Locale”],”plPL”:[0,”No Page for Locale”],”arAR”:[0,”No Page for Locale”],”nlNL”:[0,”No Page for Locale”],”thTH”:[0,”No Page for Locale”],”trTR”:[0,”No Page for Locale”],”heIL”:[0,”No Page for Locale”],”lvLV”:[0,”No Page for Locale”],”etEE”:[0,”No Page for Locale”],”ltLT”:[0,”No Page for Locale”]}],”url”:[0,”https://blog.cloudflare.com/remote-model-context-protocol-servers-mcp”],”metadata”:[0,{“title”:[0,”Build and deploy Remote Model Context Protocol (MCP) servers to Cloudflare”],”description”:[0,”You can now build and deploy remote MCP servers to Cloudflare, and we handle the hard parts of building remote MCP servers for you. Unlike local MCP servers you may have previously used, remote MCP servers are Internet-accessible. People simply sign in and grant permissions to MCP clients using familiar authorization flows.”],”imgPreview”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1SwF8Gu9jc5mjMgbH1r4Qs/1367a66d6d34dfb3c66a16aecad959b9/Build_and_deploy_Remote_Model_Context_Protocol__MCP__servers_to_Cloudflare_-OG.png”]}]}],[0,{“id”:[0,”qBn1L12sUXNIbkTPY5HyK”],”title”:[0,”Improving Data Loss Prevention accuracy with AI-powered context analysis”],”slug”:[0,”improving-data-loss-prevention-accuracy-with-ai-context-analysis”],”excerpt”:[0,”Cloudflare’s Data Loss Prevention is reducing false positives by using a self-improving AI-powered algorithm, built on Cloudflare’s Developer Platform.”],”featured”:[0,false],”html”:[0,”

We are excited to announce our latest innovation to Cloudflare’s Data Loss Prevention (DLP) solution: a self-improving AI-powered algorithm that adapts to your organization’s unique traffic patterns to reduce false positives. 

Many customers are plagued by the shapeshifting task of identifying and protecting their sensitive data as it moves within and even outside of their organization. Detecting this data through deterministic means, such as regular expressions, often fails because they cannot identify details that are categorized as personally identifiable information (PII) nor intellectual property (IP). This can generate a high rate of false positives, which contributes to noisy alerts that subsequently may lead to review fatigue. Even more critically, this less than ideal experience can turn users away from relying on our DLP product and result in a reduction in their overall security posture. 

Built into Cloudflare’s DLP Engine, AI enables us to intelligently assess the contents of a document or HTTP request in parallel with a customer’s historical reports to determine context similarity and draw conclusions on data sensitivity with increased accuracy.

In this blog post, we’ll explore DLP AI Context Analysis, its implementation using Workers AI and Vectorize, and future improvements we’re developing. 

\n

\n

Understanding false positives and their impact on user confidence

\n \n \n \n

\n

Data Loss Prevention (DLP) at Cloudflare detects sensitive information by scanning potential sources of data leakage across various channels such as web, cloud, email, and SaaS applications. While we leverage several detection methods, pattern-based methods like regular expressions play a key role in our approach. This method is effective for many types of sensitive data. However, certain information can be challenging to classify solely through patterns. For instance, U.S. Social Security Numbers (SSNs), structured as AAA-GG-SSSS, sometimes with dashes omitted, are often confused with other similarly formatted data, such as U.S. taxpayer identification numbers, bank account numbers, or phone numbers. 

Since announcing our DLP product, we have introduced new capabilities like confidence thresholds to reduce the number of false positives users receive. This method involves examining the surrounding context of a pattern match to assess Cloudflare’s confidence in its accuracy. With confidence thresholds, users specify a threshold (low, medium, or high) to signify a preference for how tolerant detections are to false positives. DLP uses the chosen threshold as a minimum, surfacing only those detections with a confidence score that meets or exceeds the specified threshold.  

\n

\n \"\"\n

However, implementing context analysis is also not a trivial task. A straightforward approach might involve looking for specific keywords near the matched pattern, such as “SSN” near a potential SSN match, but this method has its limitations. Keyword lists are often incomplete, users may make typographical errors, and many true positives do not have any identifying keywords nearby (e.g., bank accounts near routing numbers or SSNs near names).

\n

\n

Leveraging AI/ML for enhanced detection accuracy

\n \n \n \n

\n

To address the limitations of a hardcoded strategy for context analysis, we have developed a dynamic, self-improving algorithm that learns from customer feedback to further improve their future experience. Each time a customer reports a false positive via decrypted payload logs, the system reduces its future confidence for hits in similar contexts. Conversely, reports of true positives increase the system’s confidence for hits in similar contexts. 

\n

\n \"\"\n

To determine context similarity, we leverage Workers AI. Specifically, a pretrained language model that converts the text into a high-dimensional vector (i.e. text embedding). These embeddings capture the meaning of the text, ensuring that two sentences with the same meaning but different wording map to vectors that are close to each other. 

When a pattern match is detected, the system uses the AI model to compute the embedding of the surrounding context. It then performs a nearest neighbor search to find previously logged false or true positives with similar meanings. This allows the system to identify context similarities even if the exact wording differs, but the meaning remains the same. 

\n

\n \"\"\n

In our experiments using Cloudflare employee traffic, this approach has proven robust, effectively handling new pattern matches it hadn’t encountered before. When the DLP admin reports false and true positives through the Cloudflare dashboard while viewing the payload log of a policy match, it helps DLP continue to improve, leading to a significant reduction in false positives over time. 

\n

\n

Seamless integration with Workers AI and Vectorize

\n \n \n \n

\n

In developing this new feature, we used components from Cloudflare’s developer platform — Workers AI and Vectorize — which helps simplify our design. Instead of managing the underlying infrastructure ourselves, we leveraged Cloudflare Workers as the foundation, using Workers AI for text embedding, and Vectorize as the vector database. This setup allows us to focus on the algorithm itself without the overhead of provisioning underlying resources.  

Thanks to Workers AI, converting text into embeddings couldn’t be easier. With just a single line of code we can transform any text into its corresponding vector representation.

\n

const result = await env.AI.run(model, {text: [text]}).data;

\n

This handles everything from tokenization to GPU-powered inference, making the process both simple and scalable.

The nearest neighbor search is equally straightforward. After obtaining the vector from Workers AI, we use Vectorize to quickly find similar contexts from past reports. In the meantime, we store the vector for the current pattern match in Vectorize, allowing us to learn from future feedback. 

To optimize resource usage, we’ve incorporated a few more clever techniques. For example, instead of storing every vector from pattern hits, we use online clustering to group vectors into clusters and store only the cluster centroids along with counters for tracking hits and reports. This reduces storage needs and speeds up searches. Additionally, we’ve integrated Cloudflare Queues to separate the indexing process from the DLP scanning hot path, ensuring a robust and responsive system.

\n

\n \"\"\n

Privacy is a top priority. We redact any matched text before conversion to embeddings, and all vectors and reports are stored in customer-specific private namespaces across Vectorize, D1, and Workers KV. This means each customer’s learning process is independent and secure. In addition, we implement data retention policies so that vectors that have not been accessed or referenced within 60 days are automatically removed from our system.  

\n

\n

Limitations and continuous improvements

\n \n \n \n

\n

AI-driven context analysis significantly improves the accuracy of our detections. However, this comes at the cost of some increase in latency for the end user experience.  For requests that do not match any enabled DLP entries, there will be no latency increase.  However, requests that match an enabled entry in a profile with AI context analysis enabled will typically experience an increase in latency of about 400ms. In rare extreme cases, for example requests that match multiple entries, that latency increase could be as high as 1.5 seconds. We are actively working to drive the latency down, ideally to a typical increase of 250ms or better. 

Another limitation is that the current implementation supports English exclusively because of our choice of the language model. However, Workers AI is developing a multilingual model which will enable DLP to increase support across different regions and languages.

Looking ahead, we also aim to enhance the transparency of AI context analysis. Currently, users have no visibility on how the decisions are made based on their past false and true positive reports. We plan to develop tools and interfaces that provide more insight into how confidence scores are calculated, making the system more explainable and user-friendly.  

With this launch, AI context analysis is only available for Gateway HTTP traffic. By the end of 2025, AI context analysis will be available in both CASB and Email Security so that customers receive the same AI enhancements across their entire data landscape.

\n

\n

Unlock the benefits: start using AI-powered detection features today

\n \n \n \n

\n

DLP’s AI context analysis is in closed beta. Sign up here for early access to experience immediate improvements to your DLP HTTP traffic matches. More updates are coming soon as we approach general availability!

To get access to DLP via Cloudflare One, contact your account manager.

“],”published_at”:[0,”2025-03-21T13:00+00:00″],”updated_at”:[0,”2025-03-31T08:23:16.732Z”],”feature_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7k1jqytoKrUA4FF4ivwLvy/ddf9a33f4722a915b9ae51095cfde33a/Feature_Image.png”],”tags”:[1,[[0,{“id”:[0,”3DmitkNK6euuD5BlhuvOLW”],”name”:[0,”Security Week”],”slug”:[0,”security-week”]}],[0,{“id”:[0,”J61Eszqn98amrYHq4IhTx”],”name”:[0,”Zero Trust”],”slug”:[0,”zero-trust”]}],[0,{“id”:[0,”4yBlHkuMJq9VSFd341CkxY”],”name”:[0,”DLP”],”slug”:[0,”dlp”]}],[0,{“id”:[0,”2UI24t7uddD0CIIUJCu1f4″],”name”:[0,”SASE”],”slug”:[0,”sase”]}],[0,{“id”:[0,”6l7hyMgGAf9GhOz3E7MNxh”],”name”:[0,”Data Protection”],”slug”:[0,”data-protection”]}],[0,{“id”:[0,”4Z2oveL0P0AeqGa5lL4Vo1″],”name”:[0,”Cloudflare One”],”slug”:[0,”cloudflare-one”]}],[0,{“id”:[0,”1Wf1Dpb2AFicG44jpRT29y”],”name”:[0,”Workers AI”],”slug”:[0,”workers-ai”]}]]],”relatedTags”:[0],”authors”:[1,[[0,{“name”:[0,”Warnessa Weaver”],”slug”:[0,”warnessa-weaver”],”bio”:[0],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5MqbUUVujPmYlSKnjtuH7g/285e845343651fb875097ade26346ebf/_tmp_mini_magick20231208-2-1rx6cqm.jpg”],”location”:[0],”website”:[0],”twitter”:[0],”facebook”:[0]}],[0,{“name”:[0,”Tom Shen”],”slug”:[0,”tom-shen”],”bio”:[0],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/e0rSZrw05SKGlLQ3lRzfO/baca2afabcc0fb0cd1f22d9da9cb3770/Tom_Shen.jpg”],”location”:[0],”website”:[0],”twitter”:[0],”facebook”:[0]}],[0,{“name”:[0,”Joshua Johnson”],”slug”:[0,”joshua-johnson”],”bio”:[0,null],”profile_image”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/18NCR3dW2lVfMZmp51xguR/e3cd69751dd3b327c8a98794f4809ca6/joshua-johnson.jpeg”],”location”:[0,null],”website”:[0,”https://www.linkedin.com/in/joshua-johnson-04418182/”],”twitter”:[0,null],”facebook”:[0,null]}]]],”meta_description”:[0,”Cloudflare’s Data Loss Prevention is reducing false positives by using a self-improving AI-powered algorithm, built on Cloudflare’s Developer Platform, to improve detection accuracy through AI context analysis.”],”primary_author”:[0,{}],”localeList”:[0,{“name”:[0,”LOC: Improving Data Loss Prevention accuracy with AI-powered context analysis”],”enUS”:[0,”English for Locale”],”zhCN”:[0,”Translated for Locale”],”zhHansCN”:[0,”No Page for Locale”],”zhTW”:[0,”No Page for Locale”],”frFR”:[0,”No Page for Locale”],”deDE”:[0,”No Page for Locale”],”itIT”:[0,”No Page for Locale”],”jaJP”:[0,”Translated for Locale”],”koKR”:[0,”No Page for Locale”],”ptBR”:[0,”No Page for Locale”],”esLA”:[0,”No Page for Locale”],”esES”:[0,”No Page for Locale”],”enAU”:[0,”No Page for Locale”],”enCA”:[0,”No Page for Locale”],”enIN”:[0,”No Page for Locale”],”enGB”:[0,”No Page for Locale”],”idID”:[0,”No Page for Locale”],”ruRU”:[0,”No Page for Locale”],”svSE”:[0,”No Page for Locale”],”viVN”:[0,”No Page for Locale”],”plPL”:[0,”No Page for Locale”],”arAR”:[0,”No Page for Locale”],”nlNL”:[0,”No Page for Locale”],”thTH”:[0,”No Page for Locale”],”trTR”:[0,”No Page for Locale”],”heIL”:[0,”No Page for Locale”],”lvLV”:[0,”No Page for Locale”],”etEE”:[0,”No Page for Locale”],”ltLT”:[0,”No Page for Locale”]}],”url”:[0,”https://blog.cloudflare.com/improving-data-loss-prevention-accuracy-with-ai-context-analysis”],”metadata”:[0,{“title”:[0,”Improving Data Loss Prevention accuracy with AI-powered context analysis”],”description”:[0,”Cloudflare’s Data Loss Prevention is reducing false positives by using a self-improving AI-powered algorithm, built on Cloudflare’s Developer Platform, to improve detection accuracy through AI context analysis.”],”imgPreview”:[0,”https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3lPS0SwIYd8olZ6LylEllW/e14a79518af9464962710e80ffb24832/OG_Share_2024__25_.png”]}]}]]],”locale”:[0,”en-us”],”translations”:[0,{“posts.by”:[0,”By”],”footer.gdpr”:[0,”GDPR”],”lang_blurb1″:[0,”This post is also available in {lang1}.”],”lang_blurb2″:[0,”This post is also available in {lang1} and {lang2}.”],”lang_blurb3″:[0,”This post is also available in {lang1}, {lang2} and {lang3}.”],”footer.press”:[0,”Press”],”header.title”:[0,”The Cloudflare Blog”],”search.clear”:[0,”Clear”],”search.filter”:[0,”Filter”],”search.source”:[0,”Source”],”footer.careers”:[0,”Careers”],”footer.company”:[0,”Company”],”footer.support”:[0,”Support”],”footer.the_net”:[0,”theNet”],”search.filters”:[0,”Filters”],”footer.our_team”:[0,”Our team”],”footer.webinars”:[0,”Webinars”],”page.more_posts”:[0,”More posts”],”posts.time_read”:[0,”{time} min read”],”search.language”:[0,”Language”],”footer.community”:[0,”Community”],”footer.resources”:[0,”Resources”],”footer.solutions”:[0,”Solutions”],”footer.trademark”:[0,”Trademark”],”header.subscribe”:[0,”Subscribe”],”footer.compliance”:[0,”Compliance”],”footer.free_plans”:[0,”Free plans”],”footer.impact_ESG”:[0,”Impact/ESG”],”posts.follow_on_X”:[0,”Follow on X”],”footer.help_center”:[0,”Help center”],”footer.network_map”:[0,”Network Map”],”header.please_wait”:[0,”Please Wait”],”page.related_posts”:[0,”Related posts”],”search.result_stat”:[0,”Results {search_range} of {search_total} for {search_keyword}“],”footer.case_studies”:[0,”Case Studies”],”footer.connect_2024″:[0,”Connect 2024″],”footer.terms_of_use”:[0,”Terms of Use”],”footer.white_papers”:[0,”White Papers”],”footer.cloudflare_tv”:[0,”Cloudflare TV”],”footer.community_hub”:[0,”Community Hub”],”footer.compare_plans”:[0,”Compare plans”],”footer.contact_sales”:[0,”Contact Sales”],”header.contact_sales”:[0,”Contact Sales”],”header.email_address”:[0,”Email Address”],”page.error.not_found”:[0,”Page not found”],”footer.developer_docs”:[0,”Developer docs”],”footer.privacy_policy”:[0,”Privacy Policy”],”footer.request_a_demo”:[0,”Request a demo”],”page.continue_reading”:[0,”Continue reading”],”footer.analysts_report”:[0,”Analyst reports”],”footer.for_enterprises”:[0,”For enterprises”],”footer.getting_started”:[0,”Getting Started”],”footer.learning_center”:[0,”Learning Center”],”footer.project_galileo”:[0,”Project Galileo”],”pagination.newer_posts”:[0,”Newer Posts”],”pagination.older_posts”:[0,”Older Posts”],”posts.social_buttons.x”:[0,”Discuss on X”],”search.icon_aria_label”:[0,”Search”],”search.source_location”:[0,”Source/Location”],”footer.about_cloudflare”:[0,”About Cloudflare”],”footer.athenian_project”:[0,”Athenian Project”],”footer.become_a_partner”:[0,”Become a partner”],”footer.cloudflare_radar”:[0,”Cloudflare Radar”],”footer.network_services”:[0,”Network services”],”footer.trust_and_safety”:[0,”Trust & Safety”],”header.get_started_free”:[0,”Get Started Free”],”page.search.placeholder”:[0,”Search Cloudflare”],”footer.cloudflare_status”:[0,”Cloudflare Status”],”footer.cookie_preference”:[0,”Cookie Preferences”],”header.valid_email_error”:[0,”Must be valid email.”],”search.result_stat_empty”:[0,”Results {search_range} of {search_total}“],”footer.connectivity_cloud”:[0,”Connectivity cloud”],”footer.developer_services”:[0,”Developer services”],”footer.investor_relations”:[0,”Investor relations”],”page.not_found.error_code”:[0,”Error Code: 404″],”search.autocomplete_title”:[0,”Insert a query. Press enter to send”],”footer.logos_and_press_kit”:[0,”Logos & press kit”],”footer.application_services”:[0,”Application services”],”footer.get_a_recommendation”:[0,”Get a recommendation”],”posts.social_buttons.reddit”:[0,”Discuss on Reddit”],”footer.sse_and_sase_services”:[0,”SSE and SASE services”],”page.not_found.outdated_link”:[0,”You may have used an outdated link, or you may have typed the address incorrectly.”],”footer.report_security_issues”:[0,”Report Security Issues”],”page.error.error_message_page”:[0,”Sorry, we can’t find the page you are looking for.”],”header.subscribe_notifications”:[0,”Subscribe to receive notifications of new posts:”],”footer.cloudflare_for_campaigns”:[0,”Cloudflare for Campaigns”],”header.subscription_confimation”:[0,”Subscription confirmed. Thank you for subscribing!”],”posts.social_buttons.hackernews”:[0,”Discuss on Hacker News”],”footer.diversity_equity_inclusion”:[0,”Diversity, equity & inclusion”],”footer.critical_infrastructure_defense_project”:[0,”Critical Infrastructure Defense Project”]}],”localesAvailable”:[1,[]],”footerBlurb”:[0,”Cloudflare’s connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.
Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.
To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.”]}” ssr client=”load” opts=”{“name”:”Post”,”value”:true}” await-children>

2025-04-06

3 min read

As one of Meta’s launch partners, we are excited to make Meta’s latest and most powerful model, Llama 4, available on the Cloudflare Workers AI platform starting today. Check out the Workers AI Developer Docs to begin using Llama 4 now.

Llama 4 is an industry-leading release that pushes forward the frontiers of open-source generative Artificial Intelligence (AI) models. Llama 4 relies on a novel design that combines a Mixture of Experts architecture with an early-fusion backbone that allows it to be natively multimodal.

The Llama 4 “herd” is made up of two models: Llama 4 Scout (109B total parameters, 17B active parameters) with 16 experts, and Llama 4 Maverick (400B total parameters, 17B active parameters) with 128 experts. The Llama Scout model is available on Workers AI today.

Llama 4 Scout has a context window of up to 10 million (10,000,000) tokens, which makes it one of the first open-source models to support a window of that size. A larger context window makes it possible to hold longer conversations, deliver more personalized responses, and support better Retrieval Augmented Generation (RAG). For example, users can take advantage of that increase to summarize multiple documents or reason over large codebases. At launch, Workers AI is supporting a context window of 131,000 tokens to start and we’ll be working to increase this in the future.

Llama 4 does not compromise parameter depth for speed. Despite having 109 billion total parameters, the Mixture of Experts (MoE) architecture can intelligently use only a fraction of those parameters during active inference. This delivers a faster response that is made smarter by the 109B parameter size.

What is a Mixture of Experts model?

A Mixture of Experts (MoE) model is a type of Sparse Transformer model that is composed of individual specialized neural networks called “experts”. MoE models also have a “router” component that manages input tokens and which experts they get sent to. These specialized experts work together to provide deeper results and faster inference times, increasing both model quality and performance.


For an illustrative example, let’s say there’s an expert that’s really good at generating code while another expert is really good at creative writing. When a request comes in to write a Fibonacci algorithm in Haskell, the router sends the input tokens to the coding expert. This means that the remaining experts might remain unactivated, so the model only needs to use the smaller, specialized neural network to solve the problem.

In the case of Llama 4 Scout, this means the model is only using one expert (17B parameters) instead of the full 109B total parameters of the model. In reality, the model probably needs to use multiple experts to handle a request, but the point still stands: an MoE model architecture is incredibly efficient for the breadth of problems it can handle and the speed at which it can handle it.

MoE also makes it more efficient to train models. We recommend reading Meta’s blog post on how they trained the Llama 4 models. While more efficient to train, hosting an MoE model for inference can sometimes be more challenging. You need to load the full model weights (over 200 GB) into GPU memory. Supporting a larger context window also requires keeping more memory available in a Key Value cache.

Thankfully, Workers AI solves this by offering Llama 4 Scout as a serverless model, meaning that you don’t have to worry about things like infrastructure, hardware, memory, etc. — we do all of that for you, so you are only one API request away from interacting with Llama 4. 

One challenge in building AI-powered applications is the need to grab multiple different models, like a Large Language Model (LLM) and a visual model, to deliver a complete experience for the user. Llama 4 solves that problem by being natively multimodal, meaning the model can understand both text and images.

You might recall that Llama 3.2 11b was also a vision model, but Llama 3.2 actually used separate parameters for vision and text. This means that when you sent an image request to the model, it only used the vision parameters to understand the image.

With Llama 4, all the parameters natively understand both text and images. This allowed Meta to train the model parameters with large amounts of unlabeled text, image, and video data together. For the user, this means that you don’t have to chain together multiple models like a vision model and an LLM for a multimodal experience — you can do it all with Llama 4.

We are excited to partner with Meta as a launch partner to make it effortless for developers to use Llama 4 in Cloudflare Workers AI. The release brings an efficient, multimodal, highly-capable and open-source model to anyone who wants to build AI-powered applications.

Cloudflare’s Developer Platform makes it possible to build complete applications that run alongside our Llama 4 inference. You can rely on our compute, storage, and agent layer running seamlessly with the inference from models like Llama 4. To learn more, head over to our developer docs model page for more information on using Llama 4 on Workers AI, including pricing, additional terms, and acceptable use policies.

Want to try it out without an account? Visit our AI playground or get started with building your AI experiences with Llama 4 and Workers AI.

Cloudflare’s connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Developer WeekDevelopersWorkers AI

You may also like

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?