Allow to connect to the Rosette API for advanced Text Mining capabilities.

Parameters:

Parameters:
Rosette applies cloud NLP from Rosette Text Analytics to each incoming row. It supports four operations:
Limits (as shown in the box): max payload 600 KB and max 50,000 characters per row. Sentiment and Categorization are English-only.
A valid Rosette API key (keep it secret; paste masked in screenshots).
Outbound Internet allowed from the worker to Rosette’s API.
Input table with:
Each row’s text fits both limits: ≤600 KB and ≤50,000 chars.
Optional: choose the Source language or keep AUTO (recommended for mixed data; note: Sentiment & Categorization still require English).
| Column | Type | Required | Notes |
|---|---|---|---|
| Document ID | string | Yes | Unique per row; used to correlate outputs. |
| Text to process | string | Yes | UTF-8 text; HTML should be pre-cleaned if you don’t want tags analyzed. |
| (implicit) Lang | string | No | Use Source text language parameter or AUTO. |
General file assumptions: UTF-8, header row present.
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| Document ID | Column selector | Yes | — | Column containing a unique identifier for the row/document. |
| Text to process | Column selector | Yes | — | Column containing the text to send to Rosette. |
| Operation | Enum | Yes | sentiment analysis |
One of: sentiment analysis, parts-of-Speech (POS), entity extraction, categorization. |
| Source text language | Enum | No | AUTO |
Language hint for the text. Options include AUTO, Arabic (ara), Chinese (zho), Dutch (nld), English (eng), French (fra), German (deu), Hebrew (heb), Indonesian (ind), Italian (ita), Japanese (jpn), Korean (kor), Pashto (pus), Persian/Dari/Farsi (fas), Portuguese (por), Russian (rus), Spanish (spa), Urdu (urd). (Sentiment & Categorization: English only.) |
| API key | Secret string | Yes | — | Your Rosette API key. Do not commit to source control. |
| Output column name prefix | String | No | (blank) | An optional prefix added to all generated output columns (e.g., ros_). |
Inter-parameter rules
0 leaves it unbounded.New columns are appended to each row. Names begin with your Output column name prefix (if provided). Exact column set depends on Operation:
Column names follow the box’s internal naming with your chosen prefix (e.g.,
ros_sentiment,ros_confidence,ros_entities, etc.). The exact suffixes may vary by version.
Upstream input
Ensure your table has id and text columns (UTF-8). Remove HTML if not desired.
Drop the Rosette box and connect it after your input/cleaning steps.
Standard tab
idtextsentiment analysis (pick others as needed)AUTO (or English (eng) for sentiment/categorization)******** (masked)ros_Advanced tab (safe defaults)
0 (unlimited; raise to e.g. 60 to self-throttle)1055000ONRun the box.
Downstream write/inspect
Use a viewer or a writer to confirm new columns with ros_ prefix appear and contain valid values for several sample rows.
Validation checklist
ros_* columns exist and are non-null for English text.| Symptom | Likely Cause | Fix |
|---|---|---|
| 401/403 in log | Invalid/missing API key; wrong scope | Re-paste a valid key; confirm it’s not expired; avoid trailing spaces. |
| 413 / “payload too large” | Row text exceeds 600 KB or 50k chars | Truncate or split long documents upstream. |
| Empty sentiment results | Non-English text or undetected language | Ensure English text; set Source text language = English (eng) explicitly. |
| 422 / unsupported language for operation | Using Sentiment/Categorization on non-English | Restrict those ops to English; for other languages, use POS or Entity extraction. |
| Timeouts / connection errors | Network egress blocked; timeout too low | Allow outbound HTTPS; increase TCP/IP timeout (e.g., 15000 ms); set retries. |
| 429 / rate limited | Throughput too high | Set Max throughput (e.g., 60 req/min); add backoff at the pipeline level. |
| No new columns | Output prefix + schema expectation mismatch | Remove the prefix temporarily to inspect names; verify downstream writer schema refresh. |
| Filter returned 0 rows | Process-all OFF with strict filter | Turn Process all rows ON or adjust filter expression/value. |
Notes
- Sentiment & Categorization are English-only (per box notice).
- Respect the 600 KB / 50k-char limit per row to avoid truncation or errors.
- Use Output column name prefix to keep schemas tidy when chaining multiple NLP steps.
