Learn how segmentation optimizes processing on large datasets and allows you to take advantage of time series and other group functions within DSCVR.
What is segmentation?
DSCVR uses columns to organize and store data in segments for fast, efficient performance on very large datasets.
Are all tables segmented?
Tables with less than 8 million rows are stored in a single segment. Tables with 8 million rows or more can be organized into multiple segments. You only need to segment a table if you want to use group functions.
When and how are tables segmented?
During the upload process, you can organize data in segments based on columns you choose. Tables may be segmented on one or more columns.
Do all g_functions require segmented tables?
Yes, g_functions require that all data for a given group be stored in a single segment. All g_functions may use single segment tables without restriction. Some restrictions apply when g_functions are used with multi-segmented tables.
How do I know if a table is single- or multi-segmented?
At the top right of the Data Grid, in the View dropdown, click Metadata for this information.
My table is multi-segmented. What restrictions apply when using g_functions?
Restrictions apply to the G, or group, parameter. Check the metadata to see columns used to segment the table. This determines the column(s) you can use as the G parameter. For example, if Customer ID is used to segment the table, this column must be included in the G parameter. If the table is segmented on more than one column, all columns used for segmentation must be included in the G parameter. You can add other columns to the G parameter if all columns used for segmentation are included.
Do column restrictions apply to parameters other than the G parameter?
No, column restrictions apply only to the G parameter.
I’m segmenting a table. How do I choose which columns to use for segmentation?
The columns you choose depend on the type of analysis you expect to do. If you choose a column, and the resulting segment is 8 million rows or more, you must choose another column, or include additional columns to segment by. If you segment on customer and get 10 million rows, try segmenting on both customer and date to reduce size. For retail data, typical columns for segmentation include customer, transaction, date, or a combination.
The table segmentation I have doesn’t provide the groups I need for the g_function analyses I create. Can I change how a table is segmented?
Yes, but you must upload the table again and segment it with new columns. For example, analytics on retail transaction data may need to be grouped by date in some analyses, and location in others. Consider creating copies of the table with different segmentations to support efficient analyses. You may also adjust segmentation when you save a worksheet as a table.