User Tools

Site Tools


doc:group

Group

Tool version

2.0.0

Keywords

Group, Common field, Operation

Summary

Groups data by a column and perform operations on other columns.

Description

This tool allows to group an input dataset by a user indicated column and perform the following functions on any columns: Mean, Median, Mode, Sum, Max, Min, Count, Concatenate, and Randomly pick.

General comments (Warning/Tips)

If your data is not tab-delimited, use Text Manipulation→Convert.

Input

  • Select data:select the input file. It must be TAB delimited.
  • Group by column: select the column to group by.
  • Ignore case while grouping?: you case choose either YES or NO.
  • Operations : click on add new operation. You can add one or several operations.
    • Type:
      • Mean
      • Median
      • Mode
      • Maximum
      • Minimum
      • Sum
      • Count: counts the elements in the specified column and returns an integer
      • Count Distinct: counts the number of unique entries
      • Concatenate: takes each item in the specified column and build a comma-delimited list.
      • Concatenate Distinct: does the same as Concatenate but with no repeat.
      • Randomly pick
      • Standard deviation
  • On column: select the column you want to perform the operation on.
  • Round result to nearest integer?: you can choose either YES or No.

Output

The output is a grouped tab-delimited dataset.

Example

EXAMPLE 1:
Usage Example: localizing the beginning position on chromosomes.

Input

Select data: file_group.txt
Group by column: c1
Ignore case while grouping?: YES
Operation 1
Type: Minimum
On column: c2
Round result to nearest integer?: YES

With file_group.txt containing:

Output

EXAMPLE 2:

Input

Select data: file_group.txt
Group by column: c1
Ignore case while grouping?: YES
Operation 1
Type: Concatenate
On column: c2
Round result to nearest integer?: YES

With file_group.txt containing:

Output

Edited on

July 23rd, 2014

doc/group.txt · Last modified: 2014/11/28 17:05 by slegras