Scraping¶
Do you remember, back in the days, these webscrapers ? You just turn them on then browse a website and they will eventually save the whole website on your machine? Well Drosse provides something similar, but a little less brutal as you can define precisely which endpoint you would like to scrape.
Warning
Endpoint scraping come along with the Proxy feature. You won't scrape your own defined mocks, right?
You can scrape your proxied endpoints either statically or dynamically.
Static scraping¶
The easiest one. Just indicate in your routes.json
file, which endpoint you want to scrape.
"countries": {
"DROSSE": {
"proxy": "https://restcountries.com/v3.1"
},
"name": {
"DROSSE": {
"scraper": {
"static": true
}
}
}
}
In the snippet above, we've told Drosse to scrape any call to the .../countries/name/....
endpoint.
Concretely, it means that Drosse will copy & save the response of any of those calls into a static JSON file in the scraped
directory of your mocks
.
Info
As usual, you can redefine this scraped
directory name in your .drosserc.js
file (see Configuration).
This can be a convenient way to populate your mocks contents if the backend API already exists. Just configure your mocks to proxy the existing API and activate the scraper. When you have enought contents, remove the proxy and redefine your mocked routes as static
mocks.
Ideally you would rework your scraped contents and create relevant static
file mocks out of it, maybe add some templates
, etc. But you can also let them as they are, in the scraped
directory: Drosse will always fallback on this directory if it doesn't find any match in the static
directory.
Dynamic scraping¶
The dynamic scraping will let you rework the scraped content and save it exactly how and where you want to.
"countries": {
"DROSSE": {
"proxy": "https://restcountries.com/v3.1"
},
"name": {
"DROSSE": {
"scraper": {
"service": true
}
}
}
}
In contrast with the Static scraping, you simply have to replace the static
by service
; see above.
When Drosse encounter that configuration, it will look for a dedicated scraper service in the scrapers
directory. The file must be named accordingly with the scraped endpoint. It's the same logic as for the normal services
naming. You take each route node, remove the path parameters and replace /
with .
. And you ignore the verb.
If we take the same example as for the services. For a GET /api/users/:id/superpowers/:name
the scraper service file will be api.users.superpowers.js
. No parameters, no verb.
Info
As always, the scrapers directory can be renamed in the .drosserc.js
file, with the scraperServicesPath
property (see Configuration).
Your service must export a function which takes 2 parameters. The first one is the response of your scraped endpoint. It will be a JS object. The second one is the same api
object as the one you get in a normal Drosse service.
This gives you then access to the db
object, the whole drosse config
object, the h3 event
, etc...
import { defineDrosseScraper } from '@jota-one/drosse'
export default defineDrosseScraper((json, { db, config, req }) {
// rework your json
// save it in the database
// create a proper JSON file in the `collections` directory to have your scraped content automatically reloaded in your DB even if you delete your db file.
})