Page 1 of 2
Anonymize Data
Posted: 01 Aug 2011 09:06
by nullusadinfinitum
Hello,
Does anyone know of a script or application to anonymize network data (in .CSV format)? I've got some new data I've collected and I want to make it available in the public domain. I've got CSV files that look like this:
ACTORA,ACTORB
ACTORA,ACTORC
ACTORB,ACTORD,ACTORF,ACTORG,ACTORH
ACTORC,ACTORD,ACTORG,ACTORI,ACTORJ
ACTORC,ACTORE
I'm trying to convert ACTORA to 1 and ACTORB to 2, etc. I.e., I want it to look like this:
1,2
1,3
2,4,5,6,7
3,4,6,8,9
3,10
Does anyone know how to do this? Have a look at the data sets at
http://snap.stanford.edu/data/index.html. That format would be perfect. I'd really appreciate your help on this, as I want to make the data available to the community.
Thank you kindly!
Re: Anonymize Data
Posted: 01 Aug 2011 13:28
by eduramiba
Hi, well I don't know one but if you are using Gephi, you can use the default generated Ids and remove personal data (copy Id column to label column for example).
Eduardo
Re: Anonymize Data
Posted: 01 Aug 2011 13:55
by nullusadinfinitum
eduramiba wrote:Hi, well I don't know one but if you are using Gephi, you can use the default generated Ids and remove personal data (copy Id column to label column for example).
Eduardo
Thank you for your help. The problem I have is that both the label and id columns are the same and they contain the personal data (I'm opening a CSV file). Any idea how I can anonymize these?
Re: Anonymize Data
Posted: 01 Aug 2011 14:36
by eduramiba
Oh, I see, I can't find a way to do this easily without programming.
Re: Anonymize Data
Posted: 01 Aug 2011 17:03
by nullusadinfinitum
eduramiba wrote:Oh, I see, I can't find a way to do this easily without programming.
Any idea where I can go to get help with writing some code for this? How much code would it be to do something like that?
Re: Anonymize Data
Posted: 01 Aug 2011 19:09
by eduramiba
It should be a short code. We can import the file with Gephi toolkit, set the Nodes Ids to 1,2,3... and export it.
Re: Anonymize Data
Posted: 01 Aug 2011 20:51
by nullusadinfinitum
eduramiba wrote:It should be a short code. We can import the file with Gephi toolkit, set the Nodes Ids to 1,2,3... and export it.
Hmm, can you walk me through an example?
Re: Anonymize Data
Posted: 01 Aug 2011 22:08
by eduramiba
Hi, just had an idea, you could do it with this
http://gephi.org/plugins/script-console/ wonderful plugin
It is really simple:
Open Gephi 0.8 alpha
Go to Tools, Plugins, Available Plugins and there install the Script Console plugin
Reboot Gephi
Open your graph file, copy and paste the following code
Code: Select all
import java.lang.String as String
i=0
graph = getGraph()
for n in graph.getNodes():
i=i+1
graph.setId(n,String.valueOf(i))
print i, "nodes"
Click Run
And that should be enough to anonimyze the Id column. For the label column, you can copy Id column values in Data Laboratory for example
Eduardo
Re: Anonymize Data
Posted: 02 Aug 2011 05:00
by nullusadinfinitum
In addition to the above proposed solutions, I have obtained the following Python code to do anonymize data programmatically:
Code: Select all
import sys
hashes = {}
count = 1
with open(sys.argv[1]) as f1:
for line in f1:
actors = line.strip("\n").split(',')
hashActors = []
for actor in actors:
try:
hashActors.append(hashes[actor])
except KeyError:
hashes[actor] = str(count)
hashActors.append(str(count))
count += 1
print(",".join(hashActors))
Thought I would post it here in case someone needs to anonymize data in the future. Thank you to everyone who assisted with this issue! Much obliged.
Re: Anonymize Data
Posted: 03 Aug 2011 12:49
by nullusadinfinitum
seniyajw wrote:In the case of parallel edges, I suggest to alert the user and make the "road Import" to act as a CSV file importer, adding weight, if possible, and leave blank the other attributes. I opened a mistake.
Not quite sure I understand. Would you be able to elaborate? Are you referring to the Python code?